logml.data.generators

Functions

generate_sa_dataset_for_cox(cfg, ...[, ...])

Appends additional steps to the preprocessing pipeline and returns the result survival analysis dataset.

generate_survival_analysis_dataset(cfg, ...)

Generates a dataset for survival analysis module.

Classes

DatasetGenerator([dataset_metadata, ...])

Basic dataset generator: applies preprocessing pipeline to the complete dataset.

class logml.data.generators.DatasetGenerator(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, objective_cfg: Optional[logml.configuration.modeling.ModelingTaskSpec] = None, cv_setup: Optional[logml.configuration.cross_validation.CrossValidationSection] = None, dataset_cls: Optional[Type[logml.data.datasets.base.BaseDataset]] = None, data_pipeline: Optional[logml.data.pipeline.PreprocessingPipeline] = None, logger=None, **kwargs)

Bases: object

Basic dataset generator: applies preprocessing pipeline to the complete dataset.

LABEL = 'plain'
run(dataframe: Optional[pandas.core.frame.DataFrame] = None) logml.data.datasets.base.BaseDataset

Execute generation procedure.

Parameters

dataframe – Incoming “raw” dataframe.

Returns

Dataset object.

get_default_ds_type()
logml.data.generators.generate_survival_analysis_dataset(cfg: GlobalConfig, global_params: Dict, sa_setup: logml.configuration.survival_analysis.SurvivalAnalysisSetup) logml.data.datasets.survival_dataset.SurvivalDataset

Generates a dataset for survival analysis module.

logml.data.generators.generate_sa_dataset_for_cox(cfg: GlobalConfig, global_params: Dict, sa_setup: logml.configuration.survival_analysis.SurvivalAnalysisSetup, column_names: List[str], normalize_numericals: bool = False, thresholds_mapping: Dict[str, float] = None)

Appends additional steps to the preprocessing pipeline and returns the result survival analysis dataset.

The preprocessing pipeline is adjusted to include the following steps:
  • select only target features + survival target

  • normalize numericals, if needed. Otherwise optional threshold mapping will be applied to binarize numericals.

  • one-hot encoding for categoricals

  • missing values imputation