logml.data.datasets.base
Functions
Declare abstract field, which is to be set by an inheritor. |
Classes
|
Base dataset: provides dataframe with metadata. |
|
Defines cross-validation behaviour of dataset. |
- logml.data.datasets.base.abstract_field() property
Declare abstract field, which is to be set by an inheritor.
- class logml.data.datasets.base.BaseDataset(*dont_use_positional_args, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None, logger=None, generator_type: str = 'plain', **kwargs)
Bases:
object
Base dataset: provides dataframe with metadata.
- LABEL = 'base_dataset'
- validate_metadata(raise_error=False) None
Validates dataset metadata.
- Raises
ValueError in case if columns listed in metadata are not present in the dataframe. –
- property dataframe: pandas.core.frame.DataFrame
Get underlying pandas dataframe.
- get_hash() str
Return a hash for the dataframe.
NOTE: serialization breaks the logic, so after you dump and load a dataset the result dataset hash will differ from the initial one.
- dump(path: Union[str, pathlib.Path], metadata_path: Optional[Union[str, pathlib.Path]] = None) None
Saves dataset to disk.
- classmethod load(path: Union[str, pathlib.Path], metadata_path: Optional[Union[str, pathlib.Path]] = None) logml.data.datasets.base.BaseDataset
Load dataset from pair of files (data + metadata).
- get_features_list() List[str]
Return a list of ‘feature’ columns. For base dataset it is all columns minus special ones.
- get_features_dataframe(set_index=True) pandas.core.frame.DataFrame
Return subset of current dataframe with feature columns.
- get_targets_dataframe(set_index=True) pandas.core.frame.DataFrame
Return subset of current dataframe with target columns
- class logml.data.datasets.base.CrossValidationMixin(cross_validator: Optional[Union[sklearn.model_selection._split.BaseCrossValidator, Iterable]] = None, **kwargs)
Bases:
abc.ABC
Defines cross-validation behaviour of dataset.
- abstract property cv_dataframe: pandas.core.frame.DataFrame
Returns CV dataframe.
- abstract property cv_features: numpy.array
Returns features from CV dataframe.
- abstract property cv_targets: numpy.array
Returns targets from CV dataframe.
- property n_folds: int
Get number of CV folds.
- get_cv_generator() Iterator[Tuple[numpy.ndarray, numpy.ndarray]]
Returns an iterable of CV train/test indices.
- Yields
tuple – Pair of train and test indices: (train, test).
- get_folds_generator() Iterator[Tuple[Tuple[numpy.ndarray, numpy.ndarray], Tuple[numpy.ndarray, numpy.ndarray]]]
Returns an iterable of CV train/test data arrays (as opposed to indices).
- Yields
tuple(tuple(x_train, y_train), tuple(x_test, y_test)) – Train-test folds, split to X and y parts.
- set_cv(cross_validator)
Update cross validation.