logml.data.transformers.encoding

Classes

BinaryEncodingTransformer(**kwargs)

Provides binary encoding functionality.

BucketizeTransformer(*args, **kwargs)

Encodes values based on a list of buckets (intervals + labels).

DNAIndicatorsBinarizationTransformer(params)

Provides binarization functionality for _DNA columns.

DateTimeEncodingTransformer(*args, **kwargs)

Encode datetime columns for ML.

LabelEncodingTransformer(**kwargs)

Provides label encoding functionality.

MapEncodingTransformer(params[, ...])

Encode values according to the map provided.

MultiLabelEncodingTransformer(**kwargs)

Provides one-hot encoding functionality for multi-label dtypes.

OneHotEncodingTransformer(**kwargs)

Provides one-hot encoding functionality.

class logml.data.transformers.encoding.BinaryEncodingTransformer(**kwargs)

Bases: logml.data.base.BaseTransformer

Provides binary encoding functionality.

LABEL = 'binary_encoding'
ENCODING = 'binary'
DEFAULT_PARAMS = {}
CONFIG_CLASS

alias of logml.data.config.EncodingTransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fits binary encoder.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies required encoders, drops original columns.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.LabelEncodingTransformer(**kwargs)

Bases: logml.data.base.BaseTransformer

Provides label encoding functionality.

LABEL = 'label_encoding'
ENCODING = 'label'
CONFIG_CLASS

alias of logml.data.config.EncodingTransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fits required encoders.

update_transform_log(change: logml.data.utils.DataTransformLogItem)

See parent description.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies required encoders.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.OneHotEncodingTransformer(**kwargs)

Bases: logml.data.base.BaseTransformer

Provides one-hot encoding functionality.

LABEL = 'one_hot'
ENCODING = 'one_hot'
DEFAULT_PARAMS = {'handle_unknown': 'ignore'}
CONFIG_CLASS

alias of logml.data.config.EncodingTransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fits required encoders.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies required encoders, drops original columns.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.MultiLabelEncodingTransformer(**kwargs)

Bases: logml.data.base.BaseTransformer

Provides one-hot encoding functionality for multi-label dtypes.

LABEL = 'multi_label_encoding'
ENCODING = 'multi_label'
CONFIG_CLASS

alias of logml.data.config.MultiLabelOneHotTransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fits required encoders.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies required encoders, drops original columns.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.DNAIndicatorsBinarizationTransformer(params: logml.data.config.BaseTransformerParams, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: Dict = None, logger=None)

Bases: logml.data.base.BaseTransformer

Provides binarization functionality for _DNA columns.

LABEL = 'binarize_dna'
CONFIG_CLASS

alias of logml.data.config.BaseTransformerParams

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Binarizes a given set of columns.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.MapEncodingTransformer(params: logml.data.config.BaseTransformerParams, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: Dict = None, logger=None)

Bases: logml.data.base.BaseTransformer

Encode values according to the map provided.

Sample config:

steps:
  - transformer: map_encoding
    params:
        columns_to_include:
            - .*_DNA$
        mapping:
            AMP: 0
            DEL: 1
            REARG: 2
            SNP: 3
            VUS: 4
            WT: 5
        unknown_values: -1
LABEL = 'map_encoding'
CONFIG_CLASS

alias of logml.data.config.MapEncodingTransformerParams

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies transformation.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

See parent description.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.BucketizeTransformer(*args, **kwargs)

Bases: logml.data.base.BaseTransformer

Encodes values based on a list of buckets (intervals + labels).

Sample config:

steps:
  - transformer: bucketize
    params:
        columns_to_include:
            - PDL1_score
        suffix: _1_50_bucketized
        buckets:
            - left_bound: 0
              right_bound: 1
              alias: '<1%'
            - left_bound: 1
              right_bound: 50
              alias: '>=1%-<50%'
            - left_bound: 50
              right_bound: 100
              alias: '>=50%'
        remove_base_columns: True
LABEL = 'bucketize'
CONFIG_CLASS

alias of logml.data.config.BucketizeTransformerParams

static bucketize_column(column: pandas.core.series.Series, buckets: List[logml.data.config.BucketDefinition])

Returns a bucketized series based on a given list of bucket definitions.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies transformation.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.encoding.DateTimeEncodingTransformer(*args, **kwargs)

Bases: logml.data.base.BaseTransformer

Encode datetime columns for ML.

A datetime column produces three new columns: - {colname}_year_rel - relative number of years since minimal value of the column. - {colname}_year_day_sin/cos: day of the year cyclicly encoded.

LABEL = 'encode_datetime'
CONFIG_CLASS

alias of logml.data.config.BaseTransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Fit by determining affected columns.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Encode columns from a given dataframe.

update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]