logml.data.pipeline

Classes

PreprocessingDebugStep([tr, df, md, exc])

PreprocessingPipeline(data_preprocessing_cfg)

Entity that provides .fit/.transform interface to handle data preprocessing based on 'DatasetPreprocessingSection' object.

class logml.data.pipeline.PreprocessingDebugStep(tr: object = None, df: pandas.core.frame.DataFrame = None, md: logml.data.metadata.DatasetMetadata = None, exc: Exception = None)

Bases: object

tr: object = None
df: pandas.core.frame.DataFrame = None
md: logml.data.metadata.DatasetMetadata = None
exc: Exception = None
class logml.data.pipeline.PreprocessingPipeline(data_preprocessing_cfg: DatasetPreprocessingSection, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: dict = None, logger=None, debug=False)

Bases: object

Entity that provides .fit/.transform interface to handle data preprocessing based on ‘DatasetPreprocessingSection’ object.

reset()

Parses a given config and create a set of required transformers.

fit_transform(dataframe: pandas.core.frame.DataFrame, dataset_metadata: logml.data.metadata.DatasetMetadata = None) Optional[pandas.core.frame.DataFrame]

Fits transformers using a given dataframe, and transforms it during this process.

Parameters
  • dataframe – Incoming dataframe to process. Each of steps is sequentially fits on the dataframe, and then transforms it for the following steps.

  • dataset_metadata – Optional metadata object. Will be modified when needed by transformers, e.g. when generating new columns or changing data type.

Returns

Transformed dataframe, on which pipeline has fitted.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Transforms a given dataframe using the inner Steps.

get_corr_groups() Optional[pandas.core.frame.DataFrame]

If ‘remove_correlated_columns’ transformed exists, try to fetch its groups.