logml.data.presets

Functions

generate_preprocessing_preset_steps(...)

Defines the simplest data preprocessing pipeline.

logml.data.presets.generate_preprocessing_preset_steps(objective_metadata: logml.configuration.modeling.ModelingTaskSpec, params: logml.configuration.modeling.DatasetPreprocessingPresetSection, rnd: logml.common.RandomGen)

Defines the simplest data preprocessing pipeline.

The following step are considered:

  • selection of features of interest

  • removal of rows with undefined target

  • removal of rows with too many nans

  • removal of features with too many nans

  • numericals normalization # goes before imputation for stability

  • numerical missing values imputation

  • categorical missing values imputation

  • categoricals encoding

  • removal of correlated features

  • target preprocesing:
    • log1p in case if specified

    • label encoding if classification.