logml.data.transformers.encoding
Classes
|
Provides binary encoding functionality. |
|
Encodes values based on a list of buckets (intervals + labels). |
Provides binarization functionality for _DNA columns. |
|
|
Encode datetime columns for ML. |
|
Provides label encoding functionality. |
|
Encode values according to the map provided. |
|
Provides one-hot encoding functionality for multi-label dtypes. |
|
Provides one-hot encoding functionality. |
- class logml.data.transformers.encoding.BinaryEncodingTransformer(**kwargs)
Bases:
logml.data.base.BaseTransformer
Provides binary encoding functionality.
- LABEL = 'binary_encoding'
- ENCODING = 'binary'
- DEFAULT_PARAMS = {}
- CONFIG_CLASS
- fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)
Fits binary encoder.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Applies required encoders, drops original columns.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
Update metadata according to the change made.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.LabelEncodingTransformer(**kwargs)
Bases:
logml.data.base.BaseTransformer
Provides label encoding functionality.
- LABEL = 'label_encoding'
- ENCODING = 'label'
- CONFIG_CLASS
- fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)
Fits required encoders.
- update_transform_log(change: logml.data.utils.DataTransformLogItem)
See parent description.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Applies required encoders.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
Update metadata according to the change made.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.OneHotEncodingTransformer(**kwargs)
Bases:
logml.data.base.BaseTransformer
Provides one-hot encoding functionality.
- LABEL = 'one_hot'
- ENCODING = 'one_hot'
- DEFAULT_PARAMS = {'handle_unknown': 'ignore'}
- CONFIG_CLASS
- fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)
Fits required encoders.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Applies required encoders, drops original columns.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
Update metadata according to the change made.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.MultiLabelEncodingTransformer(**kwargs)
Bases:
logml.data.base.BaseTransformer
Provides one-hot encoding functionality for multi-label dtypes.
- LABEL = 'multi_label_encoding'
- ENCODING = 'multi_label'
- CONFIG_CLASS
alias of
logml.data.config.MultiLabelOneHotTransformerParams
- fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)
Fits required encoders.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Applies required encoders, drops original columns.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
Update metadata according to the change made.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.DNAIndicatorsBinarizationTransformer(params: logml.data.config.BaseTransformerParams, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: Dict = None, logger=None)
Bases:
logml.data.base.BaseTransformer
Provides binarization functionality for _DNA columns.
- LABEL = 'binarize_dna'
- CONFIG_CLASS
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Binarizes a given set of columns.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.MapEncodingTransformer(params: logml.data.config.BaseTransformerParams, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: Dict = None, logger=None)
Bases:
logml.data.base.BaseTransformer
Encode values according to the map provided.
Sample config:
steps: - transformer: map_encoding params: columns_to_include: - .*_DNA$ mapping: AMP: 0 DEL: 1 REARG: 2 SNP: 3 VUS: 4 WT: 5 unknown_values: -1
- LABEL = 'map_encoding'
- CONFIG_CLASS
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Applies transformation.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
See parent description.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.BucketizeTransformer(*args, **kwargs)
Bases:
logml.data.base.BaseTransformer
Encodes values based on a list of buckets (intervals + labels).
Sample config:
steps: - transformer: bucketize params: columns_to_include: - PDL1_score suffix: _1_50_bucketized buckets: - left_bound: 0 right_bound: 1 alias: '<1%' - left_bound: 1 right_bound: 50 alias: '>=1%-<50%' - left_bound: 50 right_bound: 100 alias: '>=50%' remove_base_columns: True
- LABEL = 'bucketize'
- CONFIG_CLASS
- static bucketize_column(column: pandas.core.series.Series, buckets: List[logml.data.config.BucketDefinition])
Returns a bucketized series based on a given list of bucket definitions.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Applies transformation.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
Update metadata according to the change made.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]
- class logml.data.transformers.encoding.DateTimeEncodingTransformer(*args, **kwargs)
Bases:
logml.data.base.BaseTransformer
Encode datetime columns for ML.
A datetime column produces three new columns: - {colname}_year_rel - relative number of years since minimal value of the column. - {colname}_year_day_sin/cos: day of the year cyclicly encoded.
- LABEL = 'encode_datetime'
- CONFIG_CLASS
- fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)
Fit by determining affected columns.
- transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame
Encode columns from a given dataframe.
- update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None
Update metadata according to the change made.
- params: BaseTransformerParams
- global_params: Dict
- metadata_cfg: ModelingTaskSpec
- affected_columns_: List[str]