logml.data.manager

Classes

DatasetsManager(cfg, global_params[, ...])

Utility for datasets generation and handling.

class logml.data.manager.DatasetsManager(cfg: logml.GlobalConfig, global_params: dict, logger=None, sequential_naming=False, random_state=None, debug=False)

Bases: object

Utility for datasets generation and handling.

validate_ds_type(ds_type: str) str
generate_dataset(shuffle: bool = False, sequence_no=0, ds_type: Optional[str] = None) str

Reads a dataframe and generates a dataset on top.

list_datasets() List[str]

Returns all available datasets of a given type.

get_dataset(dataset_hash: str) logml.data.datasets.base.BaseDataset

Returns a requested dataset.

dump_dataset(dataset: logml.data.datasets.base.BaseDataset, sequence_no=0, transform_log=None, corr_groups: Optional[pandas.core.frame.DataFrame] = None) None

Dumps dataset and accompanying artifacts.

generate_datasets(ds_ids: Optional[List[int]] = None) List[str]

Generates a number of datasets for a given type (based on config).

Datasets are pickled into DatasetsOutputStructure.datasets_path folder, using hash as filename.

Returns

List of generated dataset hashes.

dump_debug_data(name, steps)