logml.data.generators
Functions
|
Appends additional steps to the preprocessing pipeline and returns the result survival analysis dataset. |
|
Generates a dataset for survival analysis module. |
Classes
|
Basic dataset generator: applies preprocessing pipeline to the complete dataset. |
- class logml.data.generators.DatasetGenerator(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, objective_cfg: Optional[logml.configuration.modeling.ModelingTaskSpec] = None, cv_setup: Optional[logml.configuration.cross_validation.CrossValidationSection] = None, dataset_cls: Optional[Type[logml.data.datasets.base.BaseDataset]] = None, data_pipeline: Optional[logml.data.pipeline.PreprocessingPipeline] = None, logger=None, **kwargs)
Bases:
object
Basic dataset generator: applies preprocessing pipeline to the complete dataset.
- LABEL = 'plain'
- run(dataframe: Optional[pandas.core.frame.DataFrame] = None) logml.data.datasets.base.BaseDataset
Execute generation procedure.
- Parameters
dataframe – Incoming “raw” dataframe.
- Returns
Dataset object.
- get_default_ds_type()
- logml.data.generators.generate_survival_analysis_dataset(cfg: GlobalConfig, global_params: Dict, sa_setup: logml.configuration.survival_analysis.SurvivalAnalysisSetup) logml.data.datasets.survival_dataset.SurvivalDataset
Generates a dataset for survival analysis module.
- logml.data.generators.generate_sa_dataset_for_cox(cfg: GlobalConfig, global_params: Dict, sa_setup: logml.configuration.survival_analysis.SurvivalAnalysisSetup, column_names: List[str], normalize_numericals: bool = False, thresholds_mapping: Dict[str, float] = None)
Appends additional steps to the preprocessing pipeline and returns the result survival analysis dataset.
- The preprocessing pipeline is adjusted to include the following steps:
select only target features + survival target
normalize numericals, if needed. Otherwise optional threshold mapping will be applied to binarize numericals.
one-hot encoding for categoricals
missing values imputation