logml.data.config

Classes

DropNaMode(value)

Specifies how to apply DropNA transformation.

class logml.data.config.BaseTransformerParams

Bases: pydantic.main.BaseModel

Defines schema for transformer params.

Columns inclusion/exclusion schema (see also get_affected_columns):

make set by union all columns that match include_columns filter.
subtract columns that match exclude_columns filter.

Filtering expressions are identified by prefix:

‘re:’ or empty - regular expression. Any valid python regular expression, e.g. “.*_DNA$”
‘g:’ - columns’ group filter. Should completely match group name, e.g. “g:clinical_data”.
‘$’ - keyword:
- $features - all features (input columns, covariates).
- $numeric_features - only numeric features.
- $cat_features - only categorical features.
- $target - target feature. (For survival problems will be two columns - time+event).
- $all - all columns except key columns.

If no know prefix detected, the filter is considered as regular expression.

Show JSON schema

{
   "title": "BaseTransformerParams",
   "description": "Defines schema for transformer params.\n\nColumns inclusion/exclusion schema (see also `get_affected_columns`):\n\n- make set by union all columns that match `include_columns` filter.\n- subtract columns that match `exclude_columns` filter.\n\nFiltering expressions are identified by prefix:\n\n- 're:' or empty - regular expression. Any valid python regular expression, e.g. \".*_DNA$\"\n- 'g:' - columns' group filter. Should completely match group name, e.g. \"g:clinical_data\".\n- '$' - keyword:\n    - $features - all features (input columns, covariates).\n    - $numeric_features - only numeric features.\n    - $cat_features - only categorical features.\n    - $target - target feature. (For survival problems will be two columns - time+event).\n    - $all - all columns except key columns.\n\nIf no know prefix detected, the filter is considered as regular expression.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   }
}

Fields

columns_to_exclude (List[str])
columns_to_include (List[str])

field columns_to_include: List[str] = ['.*']: List of filtering expressions. By default, all columns are included.

field columns_to_exclude: List[str] = []: List of filtering expressions. Empty by default.

class logml.data.config.FillNaTransformerParams

Bases: logml.data.config.BaseTransformerParams

FillNaTransformer params

Show JSON schema

{
   "title": "FillNaTransformerParams",
   "description": "FillNaTransformer params",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "constant": {
         "title": "Constant",
         "description": "Value to replace NaN values with.",
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "number"
            },
            {
               "type": "string"
            }
         ]
      }
   },
   "required": [
      "constant"
   ]
}

Fields

constant (Union[int, float, str])

field constant: Union[int, float, str] [Required]: Value to replace NaN values with.

class logml.data.config.BucketDefinition

Bases: pydantic.main.BaseModel

Defines a bucket for numerical values.

Bucket: (left_bound, right_bound].

NOTE: left and right bounds might be included/excluded if needed.

Show JSON schema

{
   "title": "BucketDefinition",
   "description": "Defines a bucket for numerical values.\n\nBucket: (left_bound, right_bound].\n\nNOTE: left and right bounds might be included/excluded if needed.",
   "type": "object",
   "properties": {
      "left_bound": {
         "title": "Left Bound",
         "description": "Defines a left bound for the bucket.",
         "default": NaN,
         "type": "number"
      },
      "right_bound": {
         "title": "Right Bound",
         "description": "Defines a right bound for the bucket.",
         "default": NaN,
         "type": "number"
      },
      "include_left_bound": {
         "title": "Include Left Bound",
         "description": "Whether to include left bound to bucket range.",
         "default": true,
         "type": "boolean"
      },
      "include_right_bound": {
         "title": "Include Right Bound",
         "description": "Whether to include right bound to bucket range.",
         "default": true,
         "type": "boolean"
      },
      "alias": {
         "title": "Alias",
         "description": "Defines an alias for the bucket.",
         "type": "string"
      }
   },
   "required": [
      "alias"
   ]
}

Fields

alias (str)
include_left_bound (bool)
include_right_bound (bool)
left_bound (float)
right_bound (float)

field left_bound: float = nan: Defines a left bound for the bucket.

field right_bound: float = nan: Defines a right bound for the bucket.

field include_left_bound: bool = True: Whether to include left bound to bucket range.

field include_right_bound: bool = True: Whether to include right bound to bucket range.

field alias: str [Required]: Defines an alias for the bucket.

class logml.data.config.BucketizeTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for ‘bucketize’ transformer params.

Show JSON schema

{
   "title": "BucketizeTransformerParams",
   "description": "Defines schema for 'bucketize' transformer params.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "suffix": {
         "title": "Suffix",
         "description": "Suffix that will be appended to the base column name for naming the result column.",
         "default": "__bucketized",
         "type": "string"
      },
      "buckets": {
         "title": "Buckets",
         "description": "Defines a list of buckets for transforming target column.",
         "default": [],
         "type": "array",
         "items": {
            "$ref": "#/definitions/BucketDefinition"
         }
      },
      "remove_base_columns": {
         "title": "Remove Base Columns",
         "description": "Whether base columns should be removed.",
         "default": true,
         "type": "boolean"
      }
   },
   "definitions": {
      "BucketDefinition": {
         "title": "BucketDefinition",
         "description": "Defines a bucket for numerical values.\n\nBucket: (left_bound, right_bound].\n\nNOTE: left and right bounds might be included/excluded if needed.",
         "type": "object",
         "properties": {
            "left_bound": {
               "title": "Left Bound",
               "description": "Defines a left bound for the bucket.",
               "default": NaN,
               "type": "number"
            },
            "right_bound": {
               "title": "Right Bound",
               "description": "Defines a right bound for the bucket.",
               "default": NaN,
               "type": "number"
            },
            "include_left_bound": {
               "title": "Include Left Bound",
               "description": "Whether to include left bound to bucket range.",
               "default": true,
               "type": "boolean"
            },
            "include_right_bound": {
               "title": "Include Right Bound",
               "description": "Whether to include right bound to bucket range.",
               "default": true,
               "type": "boolean"
            },
            "alias": {
               "title": "Alias",
               "description": "Defines an alias for the bucket.",
               "type": "string"
            }
         },
         "required": [
            "alias"
         ]
      }
   }
}

Fields

buckets (List[logml.data.config.BucketDefinition])
remove_base_columns (bool)
suffix (str)

field suffix: str = '__bucketized': Suffix that will be appended to the base column name for naming the result column.

field buckets: List[logml.data.config.BucketDefinition] = []: Defines a list of buckets for transforming target column.

field remove_base_columns: bool = True: Whether base columns should be removed.

class logml.data.config.DropColumnsTransformerParams

Bases: logml.data.config.BaseTransformerParams

Parameters for drop_columns transformer.

Show JSON schema

{
   "title": "DropColumnsTransformerParams",
   "description": "Parameters for `drop_columns` transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "dtypes_to_include": {
         "title": "Dtypes To Include",
         "description": "List of data types. Affected columns are additionally filtered to match these types. When empty, types filter is not applied. Higher level data kinds can be used (see py:ref:`DtypeKind`), such as \"i: for integer, \"f\" for float and so on.Most frequent options are `object`, `int64`, `float64`, `datetime64[ns]`.\n\nSee `https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes` for thelist of available standard pandas types.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   }
}

Fields

dtypes_to_include (List[str])

field dtypes_to_include: List[str] = []: List of data types. Affected columns are additionally filtered to match these types. When empty, types filter is not applied. Higher level data kinds can be used (see py:ref:DtypeKind), such as “i: for integer, “f” for float and so on.Most frequent options are object, int64, float64, datetime64[ns]. See https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes for thelist of available standard pandas types.

class logml.data.config.DecompositionTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for decomposition transformers (PCA, NMF).

Show JSON schema

{
   "title": "DecompositionTransformerParams",
   "description": "Defines schema for decomposition transformers (PCA, NMF).",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "prefix": {
         "title": "Prefix",
         "type": "string"
      }
   },
   "required": [
      "prefix"
   ]
}

Fields

inner_params (Dict)
prefix (str)

field inner_params: Dict = {}

field prefix: str [Required]

class logml.data.config.EncodingTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for encoding transformers (one-hot, label, etc.).

Show JSON schema

{
   "title": "EncodingTransformerParams",
   "description": "Defines schema for encoding transformers (one-hot, label, etc.).",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      }
   }
}

Fields

inner_params (Dict)
scope (str)

field inner_params: Dict = {}

field scope: str = 'local'

class logml.data.config.MultiLabelOneHotTransformerParams

Bases: logml.data.config.EncodingTransformerParams

Defines schema for multilabel one-hot encoding transformer.

Show JSON schema

{
   "title": "MultiLabelOneHotTransformerParams",
   "description": "Defines schema for multilabel one-hot encoding transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      },
      "separator": {
         "title": "Separator",
         "default": ",",
         "type": "string"
      }
   }
}

Fields

separator (str)

field separator: str = ','

class logml.data.config.CategoricalsEncodingTransformerParams

Bases: logml.data.config.MultiLabelOneHotTransformerParams

Defines underlying encoder to use for categoricals.

Show JSON schema

{
   "title": "CategoricalsEncodingTransformerParams",
   "description": "Defines underlying encoder to use for categoricals.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      },
      "separator": {
         "title": "Separator",
         "default": ",",
         "type": "string"
      },
      "encoding": {
         "title": "Encoding",
         "type": "string"
      }
   },
   "required": [
      "encoding"
   ]
}

Fields

encoding (str)

field encoding: str [Required]

class logml.data.config.MapEncodingTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for MapEncodingTransformer.

Show JSON schema

{
   "title": "MapEncodingTransformerParams",
   "description": "Defines schema for MapEncodingTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "mapping": {
         "title": "Mapping",
         "type": "object"
      },
      "unknown_values": {
         "title": "Unknown Values",
         "default": NaN,
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "integer"
            },
            {
               "type": "string"
            }
         ]
      }
   },
   "required": [
      "mapping"
   ]
}

Fields

mapping (Dict)
unknown_values (Union[float, int, str])

field mapping: Dict [Required]

field unknown_values: Union[float, int, str] = nan

class logml.data.config.FilteringTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for typical FilteringTransformer.

Show JSON schema

{
   "title": "FilteringTransformerParams",
   "description": "Defines schema for typical FilteringTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "type": "number"
      }
   },
   "required": [
      "threshold"
   ]
}

Fields

threshold (float)

field threshold: float [Required]

class logml.data.config.PrevalenceFilteringTransformerParams

Bases: logml.data.config.BaseTransformerParams

Parameters for prevalence_filtering transformer.

See PrevalenceFilteringTransformer for details.

Show JSON schema

{
   "title": "PrevalenceFilteringTransformerParams",
   "description": "Parameters for `prevalence_filtering` transformer.\n\nSee `PrevalenceFilteringTransformer` for details.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "type": "number"
      },
      "values": {
         "title": "Values",
         "type": "array",
         "items": {}
      }
   },
   "required": [
      "threshold",
      "values"
   ]
}

Fields

threshold (float)
values (List)

field threshold: float [Required]

field values: List [Required]

class logml.data.config.MutationsFilteringTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for typical FilteringTransformer that uses mutations.

Show JSON schema

{
   "title": "MutationsFilteringTransformerParams",
   "description": "Defines schema for typical FilteringTransformer that uses mutations.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "mutations": {
         "title": "Mutations",
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   },
   "required": [
      "mutations"
   ]
}

Fields

mutations (List[str])

field mutations: List[str] [Required]

class logml.data.config.MICETransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for MICE imputing transformer.

Show JSON schema

{
   "title": "MICETransformerParams",
   "description": "Defines schema for MICE imputing transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "random_state": {
         "title": "Random State",
         "type": "integer"
      },
      "n_nearest_features": {
         "title": "N Nearest Features",
         "default": 10,
         "type": "integer"
      },
      "max_iter": {
         "title": "Max Iter",
         "default": 20,
         "type": "integer"
      },
      "verbose": {
         "title": "Verbose",
         "default": 0,
         "type": "integer"
      },
      "sample_posterior": {
         "title": "Sample Posterior",
         "default": false,
         "type": "boolean"
      }
   }
}

Fields

random_state (Optional[int])

field random_state: Optional[int] = None

class logml.data.config.ImputingTransformerParams

Bases: logml.data.config.EncodingTransformerParams

Defines underlying imputer to use for target columns.

Show JSON schema

{
   "title": "ImputingTransformerParams",
   "description": "Defines underlying imputer to use for target columns.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      },
      "imputation": {
         "title": "Imputation",
         "type": "string"
      },
      "imputation_params": {
         "title": "Imputation Params",
         "default": {},
         "type": "object"
      }
   },
   "required": [
      "imputation"
   ]
}

Fields

imputation (str)
imputation_params (Optional[dict])

field imputation: str [Required]

field imputation_params: Optional[dict] = {}

class logml.data.config.BinarizationLambdaTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for BinarizationLambdaTransformer.

Show JSON schema

{
   "title": "BinarizationLambdaTransformerParams",
   "description": "Defines schema for BinarizationLambdaTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "type": "number"
      }
   },
   "required": [
      "threshold"
   ]
}

Fields

threshold (float)

field threshold: float [Required]

class logml.data.config.QueryBooleanTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for QueryBooleanTransformer.

Show JSON schema

{
   "title": "QueryBooleanTransformerParams",
   "description": "Defines schema for QueryBooleanTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "query": {
         "title": "Query",
         "type": "string"
      }
   },
   "required": [
      "query"
   ]
}

Fields

query (str)

field query: str [Required]

class logml.data.config.NormalizationTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines underlying normalizer to use for target columns.

Show JSON schema

{
   "title": "NormalizationTransformerParams",
   "description": "Defines underlying normalizer to use for target columns.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "normalization": {
         "title": "Normalization",
         "type": "string"
      },
      "params": {
         "title": "Params",
         "default": {},
         "type": "object"
      }
   },
   "required": [
      "normalization"
   ]
}

Fields

normalization (str)
params (dict)

field normalization: str [Required]

field params: dict = {}

class logml.data.config.AddRandomColumnsTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for AddRandomColumnsTransformer.

Show JSON schema

{
   "title": "AddRandomColumnsTransformerParams",
   "description": "Defines schema for AddRandomColumnsTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "fraction": {
         "title": "Fraction",
         "type": "number"
      }
   },
   "required": [
      "fraction"
   ]
}

Fields

fraction (float)

field fraction: float [Required]

class logml.data.config.DropNaMode(value)

Bases: str, enum.Enum

Specifies how to apply DropNA transformation.

all - when all columns are NA, any - when at least one is NA, threshold - when specified number or percentage is NA.

ALL = 'all'

ANY = 'any'

THRESHOLD = 'threshold'

class logml.data.config.DropNanRowsTransformerParams

Bases: logml.data.config.BaseTransformerParams

Configuration for drop_nan_rows transformer.

Show JSON schema

{
   "title": "DropNanRowsTransformerParams",
   "description": "Configuration for `drop_nan_rows` transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "default": 1.0,
         "exclusiveMinimum": 0.0,
         "help": "Determine >= threshold for count nan columns. If float from 0 to 1, defines ratio. If integer >= 1, then defines number columns.",
         "type": "number"
      },
      "how": {
         "default": "all",
         "help": "Determine if row is removed when we have at least one NA or all NA.\n            - `any` : If any NA values are present, drop that row.\n            - `all` : If all values are NA, drop that row.\n            - `threshold`: Use threshold to define ratio of NA values.\n        ",
         "allOf": [
            {
               "$ref": "#/definitions/DropNaMode"
            }
         ]
      }
   },
   "definitions": {
      "DropNaMode": {
         "title": "DropNaMode",
         "description": "Specifies how to apply DropNA transformation.\n\n`all` - when all columns are NA, `any` - when at least one is NA,\n`threshold` - when specified number or percentage is NA.",
         "enum": [
            "all",
            "any",
            "threshold"
         ],
         "type": "string"
      }
   }
}

Fields

how (logml.data.config.DropNaMode)
threshold (float)

field threshold: float = 1.0

Constraints

exclusiveMinimum = 0.0
help = Determine >= threshold for count nan columns. If float from 0 to 1, defines ratio. If integer >= 1, then defines number columns.

field how: logml.data.config.DropNaMode = DropNaMode.ALL

Constraints

help = Determine if row is removed when we have at least one NA or all NA. - any : If any NA values are present, drop that row. - all : If all values are NA, drop that row. - threshold: Use threshold to define ratio of NA values.

class logml.data.config.ResolveMultipleChoiceTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines parameters for ResolveMultipleChoiceTransformer.

Show JSON schema

{
   "title": "ResolveMultipleChoiceTransformerParams",
   "description": "Defines parameters for ResolveMultipleChoiceTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "keep_first_value": {
         "title": "Keep First Value",
         "default": true,
         "type": "boolean"
      },
      "delimeter": {
         "title": "Delimeter",
         "default": ",",
         "type": "string"
      }
   }
}

Fields

delimeter (str)
keep_first_value (bool)

field keep_first_value: bool = True

field delimeter: str = ','

class logml.data.config.RemoveCorrelatedColumnsParams

Bases: logml.data.config.BaseTransformerParams

Defines thresholds that will be used for Correlated columns removal.

Show JSON schema

{
   "title": "RemoveCorrelatedColumnsParams",
   "description": "Defines thresholds that will be used for Correlated columns removal.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "correlation_type": {
         "description": "Type of correlation that will be used for removing correlated features.",
         "default": "spearman",
         "allOf": [
            {
               "$ref": "#/definitions/CorrelationType"
            }
         ]
      },
      "correlation_threshold": {
         "title": "Correlation Threshold",
         "description": "Defines a correlation threshold that will be used to identify \"correlated\" features.",
         "default": 0.9,
         "type": "number"
      },
      "correlation_min_samples_fraction": {
         "title": "Correlation Min Samples Fraction",
         "description": "Additional parameter that defines the minimum fraction of samples that is required to calculate\n            correlation coefficient between two columns. As NaNs are ignored and correlation coefficient is calculated\n            on top of non-NaN subset of rows for a pair of columns - this parameter could help to make the results\n            more meaningful. Please see the reference of \"min_periods\" here:\n            https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html\n            ",
         "default": 0.3,
         "type": "number"
      },
      "correlation_group_level_cutoff": {
         "title": "Correlation Group Level Cutoff",
         "description": "Sets cutoff for how many levels of neighbours to consider when building correlation groups.\n\n        For example consider the following correlation matrix:\n\n        .. code-block::\n\n                    a    b    c    d\n                a  1.0  0.8  0.8  0.7\n                b  0.8  1.0    0    0\n                c  0.8    0  1.0  0.8\n                d  0.7    0  0.8  1.0\n\n        Let's say, we use threshold as ``> 0.7``. In this case `a` is correlated strongly with `b` and `c`, and \n        `c` correlated with `d`.\n\n        When we set cutoff to `1`, we use direct neighbours only, so there is one group `'a', 'c', 'b'`. \n        In this case `d` is not included, because the group has been already formed around `a` column.\n\n        If we set it to `-1` or anything more than 1, we use all reachable neighbours. In this case, correlation \n        group is formed as ``'a', 'c', 'b', 'd'`` due to fact that `d` is strongly correlated with `c`, disregarding \n        it weak connection to `a`. As you can see, it will result in larger groups, and possibility to assign to the \n        same group columns with correlation less than a threshold. It could reflect cross-correlation more\n        naturally in some cases.\n        ",
         "default": 1,
         "type": "integer"
      },
      "correlation_key_names": {
         "title": "Correlation Key Names",
         "description": "Defines a list of biologically rational gene names (subst) that\n            will be used for correlation groups naming. In case some of those names will appear in one of column names\n            within the same correlation group - the result correlation group identifier will contain those names.",
         "default": [
            "TP53",
            "KRAS",
            "CDKN2A",
            "CDKN2B",
            "PIK3CA",
            "ATM",
            "BRCA1",
            "SOX2",
            "GNAS2",
            "TERC",
            "STK11",
            "PDCD1",
            "LAG3",
            "TIGIT",
            "HAVCR2",
            "EOMES",
            "MTAP"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   },
   "definitions": {
      "CorrelationType": {
         "title": "CorrelationType",
         "description": "Defines available correlation types.",
         "enum": [
            "pearson",
            "spearman"
         ],
         "type": "string"
      }
   }
}

Fields

correlation_group_level_cutoff (int)
correlation_key_names (List[str])
correlation_min_samples_fraction (float)
correlation_threshold (float)
correlation_type (logml.configuration.eda.CorrelationType)

field correlation_type: logml.configuration.eda.CorrelationType = CorrelationType.SPEARMAN: Type of correlation that will be used for removing correlated features.

field correlation_threshold: float = 0.9: Defines a correlation threshold that will be used to identify “correlated” features.

field correlation_min_samples_fraction: float = 0.3: Additional parameter that defines the minimum fraction of samples that is required to calculate correlation coefficient between two columns. As NaNs are ignored and correlation coefficient is calculated on top of non-NaN subset of rows for a pair of columns - this parameter could help to make the results more meaningful. Please see the reference of “min_periods” here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html

field correlation_group_level_cutoff: int = 1: Sets cutoff for how many levels of neighbours to consider when building correlation groups. For example consider the following correlation matrix: .. code-block:: a b c d a 1.0 0.8 0.8 0.7 b 0.8 1.0 0 0 c 0.8 0 1.0 0.8 d 0.7 0 0.8 1.0 Let’s say, we use threshold as > 0.7. In this case a is correlated strongly with b and c, and c correlated with d. When we set cutoff to 1, we use direct neighbours only, so there is one group ‘a’, ‘c’, ‘b’. In this case d is not included, because the group has been already formed around a column. If we set it to -1 or anything more than 1, we use all reachable neighbours. In this case, correlation group is formed as 'a', 'c', 'b', 'd' due to fact that d is strongly correlated with c, disregarding it weak connection to a. As you can see, it will result in larger groups, and possibility to assign to the same group columns with correlation less than a threshold. It could reflect cross-correlation more naturally in some cases.

field correlation_key_names: List[str] = ['TP53', 'KRAS', 'CDKN2A', 'CDKN2B', 'PIK3CA', 'ATM', 'BRCA1', 'SOX2', 'GNAS2', 'TERC', 'STK11', 'PDCD1', 'LAG3', 'TIGIT', 'HAVCR2', 'EOMES', 'MTAP']: Defines a list of biologically rational gene names (subst) that will be used for correlation groups naming. In case some of those names will appear in one of column names within the same correlation group - the result correlation group identifier will contain those names.