logml.data.transformers.imputation

Classes

MICEImputeTransformer(*args, **kwargs)

Provides MICE imputation functionality (Multivariate Imputation by Chained Equations).

SimpleImputeTransformer(**kwargs)

Provides imputation functionality.

class logml.data.transformers.imputation.SimpleImputerParams

Bases: logml.data.config.BaseTransformerParams

Show JSON schema
{
   "title": "SimpleImputerParams",
   "description": "Defines schema for transformer params.\n\nColumns inclusion/exclusion schema (see also `get_affected_columns`):\n\n- make set by union all columns that match `include_columns` filter.\n- subtract columns that match `exclude_columns` filter.\n\nFiltering expressions are identified by prefix:\n\n- 're:' or empty - regular expression. Any valid python regular expression, e.g. \".*_DNA$\"\n- 'g:' - columns' group filter. Should completely match group name, e.g. \"g:clinical_data\".\n- '$' - keyword:\n    - $features - all features (input columns, covariates).\n    - $numeric_features - only numeric features.\n    - $cat_features - only categorical features.\n    - $target - target feature. (For survival problems will be two columns - time+event).\n    - $all - all columns except key columns.\n\nIf no know prefix detected, the filter is considered as regular expression.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "strategy": {
         "title": "Strategy",
         "description": "(mean, median, most_frequent, constant)",
         "default": "mean",
         "type": "string"
      },
      "fill_value": {
         "title": "Fill Value"
      }
   }
}

Fields
field strategy: str = 'mean'

(mean, median, most_frequent, constant)

field fill_value: Any = None
class logml.data.transformers.imputation.SimpleImputeTransformer(**kwargs)

Bases: logml.data.base.BaseTransformer

Provides imputation functionality.

LABEL = 'impute'
CONFIG_CLASS

alias of logml.data.transformers.imputation.SimpleImputerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fit method prepares a transformer for further ‘transform’ calls.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies transformations and returns the result dataframe.

update_transform_log(change: logml.data.utils.DataTransformLogItem)

Add custom data to the log.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.imputation.MICEImputeTransformer(*args, **kwargs)

Bases: logml.data.base.BaseTransformer

Provides MICE imputation functionality (Multivariate Imputation by Chained Equations). NOTE: affected columns are additionally filtered to be numerical only

LABEL = 'impute_mice'
CONFIG_CLASS

alias of logml.data.config.MICETransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fit method prepares a transformer for further ‘transform’ calls.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies transformations and returns the result dataframe.

update_transform_log(change: logml.data.utils.DataTransformLogItem)

Add custom data to the log.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]