logml.data.presets
Functions
Defines the simplest data preprocessing pipeline. |
- logml.data.presets.generate_preprocessing_preset_steps(objective_metadata: logml.configuration.modeling.ModelingTaskSpec, params: logml.configuration.modeling.DatasetPreprocessingPresetSection, rnd: logml.common.RandomGen)
Defines the simplest data preprocessing pipeline.
The following step are considered:
selection of features of interest
removal of rows with undefined target
removal of rows with too many nans
removal of features with too many nans
numericals normalization # goes before imputation for stability
numerical missing values imputation
categorical missing values imputation
categoricals encoding
removal of correlated features
- target preprocesing:
log1p in case if specified
label encoding if classification.