logml.data.transformers.lambdas

Functions

convert_value_to_float(x[, eps])

Converts a given value to float by removing and parsing special characters.

resolve_multiple_choice(value[, ...])

Implements different strategies for handling list-type values.

Classes

BaseLambdaTransformer(**kwargs)

Simple transformations with lambda functions.

BinarizationLambdaTransformer(**kwargs)

Binarizes all target columns using a given threshold.

ConvertToFloatTransformer(**kwargs)

Converts column values to floats by removing and parsing special characters.

Log1pLambdaTransformer(**kwargs)

Applies 'log1p' transformation.

LogLambdaTransformer(**kwargs)

Applies 'log' transformation.

QueryBooleanTransformer(**kwargs)

Transforms column to boolean using query.

ReplaceValueTransformer(params[, ...])

Replace values according to the map provided.

ResolveMultipleChoiceTransformer(**kwargs)

Resolves multi-value issue for list-type columns.

class logml.data.transformers.lambdas.ReplaceValuesTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for ReplaceValueTransformer.

Show JSON schema
{
   "title": "ReplaceValuesTransformerParams",
   "description": "Defines schema for ReplaceValueTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "mapping": {
         "title": "Mapping",
         "type": "object"
      }
   },
   "required": [
      "mapping"
   ]
}

Fields
field mapping: Dict [Required]
class logml.data.transformers.lambdas.ReplaceValueTransformer(params: logml.data.config.BaseTransformerParams, metadata_cfg: logml.configuration.modeling.ModelingTaskSpec = None, cfg: GlobalConfig = None, global_params: Dict = None, logger=None)

Bases: logml.data.base.BaseTransformer

Replace values according to the map provided. Not listed values are not affected.

Note: yaml natively supports special values like .nan, see https://yaml.org/spec/1.2.2/

Sample config:

steps:
  - transformer: replace_value
    params:
        columns_to_include:
            - .*_DNA$
        mapping:
            # combine two categories into the same
            VUS: 'VUS_WT'
            WT: 'VUS_WT'
LABEL = 'replace_value'
CONFIG_CLASS

alias of logml.data.transformers.lambdas.ReplaceValuesTransformerParams

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies transformation.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
logml.data.transformers.lambdas.convert_value_to_float(x, eps=1e-05)

Converts a given value to float by removing and parsing special characters.

Special characters supported:
  • ‘%’ suffix is removed

  • ‘<’ is interpreted by casting the rest and substracting EPS

  • ‘>’ is interpreted by casting the rest and adding EPS

The expected schema: [<, >]{float}[%]

logml.data.transformers.lambdas.resolve_multiple_choice(value, keep_first_value: bool = True, delimeter: str = ',', **_kwargs)

Implements different strategies for handling list-type values.

  • ‘keep_first_value’ - simply keeps the first element using a given delimeter.

class logml.data.transformers.lambdas.BaseLambdaTransformer(**kwargs)

Bases: logml.data.base.BaseTransformer

Simple transformations with lambda functions.

LABEL = None
CONFIG_CLASS

alias of logml.data.config.BaseTransformerParams

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies a lambda function defined by LABEL to all affected columns.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.lambdas.Log1pLambdaTransformer(**kwargs)

Bases: logml.data.transformers.lambdas.BaseLambdaTransformer

Applies ‘log1p’ transformation.

LABEL = 'log1p'
params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.lambdas.LogLambdaTransformer(**kwargs)

Bases: logml.data.transformers.lambdas.BaseLambdaTransformer

Applies ‘log’ transformation.

LABEL = 'log'
params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.lambdas.BinarizationLambdaTransformer(**kwargs)

Bases: logml.data.transformers.lambdas.BaseLambdaTransformer

Binarizes all target columns using a given threshold.

LABEL = 'binarization'
CONFIG_CLASS

alias of logml.data.config.BinarizationLambdaTransformerParams

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.lambdas.ResolveMultipleChoiceTransformer(**kwargs)

Bases: logml.data.transformers.lambdas.BaseLambdaTransformer

Resolves multi-value issue for list-type columns.

LABEL = 'resolve_multiple_choice'
CONFIG_CLASS

alias of logml.data.config.ResolveMultipleChoiceTransformerParams

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.lambdas.ConvertToFloatTransformer(**kwargs)

Bases: logml.data.transformers.lambdas.BaseLambdaTransformer

Converts column values to floats by removing and parsing special characters. In case casting is not possible for a value - replaces it with NaN.

LABEL = 'convert_to_float'
update_metadata(dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None) None

Update metadata according to the change made.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]
class logml.data.transformers.lambdas.QueryBooleanTransformer(**kwargs)

Bases: logml.data.transformers.lambdas.BaseLambdaTransformer

Transforms column to boolean using query. Puts 1 where query result is True, 0 otherwise.

Sample config:

data_preprocessing:
    steps:
        - transformer: query_to_bool
          params:
            columns_to_include: ['single_column_here']
            query: "single_column_here == 'YES'"
LABEL = 'query_to_bool'
CONFIG_CLASS

alias of logml.data.config.QueryBooleanTransformerParams

fit(dataframe: pandas.core.frame.DataFrame, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, **kwargs)

Nothing to fit, but at least validate.

transform(dataframe: pandas.core.frame.DataFrame) pandas.core.frame.DataFrame

Applies transformation using query to boolean.

params: BaseTransformerParams
global_params: Dict
metadata_cfg: ModelingTaskSpec
affected_columns_: List[str]