logml.data.pipeline
Classes
|
|
|
Entity that provides .fit/.transform interface to handle data preprocessing based on 'DatasetPreprocessingSection' object. |
- class logml.data.pipeline.PreprocessingDebugStep(tr: object = None, df: pandas.core.frame.DataFrame = None, md: logml.data.metadata.DatasetMetadata = None, exc: Exception = None)
Bases:
object
- tr: object = None
- df: pandas.core.frame.DataFrame = None
- md: logml.data.metadata.DatasetMetadata = None
- exc: Exception = None
- class logml.data.pipeline.PreprocessingPipeline(data_preprocessing_cfg: DatasetPreprocessingSection, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: dict = None, logger=None, debug=False)
Bases:
object
Entity that provides .fit/.transform interface to handle data preprocessing based on ‘DatasetPreprocessingSection’ object.
- reset()
Parses a given config and create a set of required transformers.
- fit_transform(dataframe: pandas.core.frame.DataFrame, dataset_metadata: logml.data.metadata.DatasetMetadata = None) Optional[pandas.core.frame.DataFrame]
Fits transformers using a given dataframe, and transforms it during this process.
- Parameters
dataframe – Incoming dataframe to process. Each of steps is sequentially fits on the dataframe, and then transforms it for the following steps.
dataset_metadata – Optional metadata object. Will be modified when needed by transformers, e.g. when generating new columns or changing data type.
- Returns
Transformed dataframe, on which pipeline has fitted.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Transforms a given dataframe using the inner Steps.
- get_corr_groups() Optional[pandas.core.frame.DataFrame]
If ‘remove_correlated_columns’ transformed exists, try to fetch its groups.