logml.data.config

Classes

DropNaMode(value)

Specifies how to apply DropNA transformation.

class logml.data.config.BaseTransformerParams

Bases: pydantic.main.BaseModel

Defines schema for transformer params.

Columns inclusion/exclusion schema (see also get_affected_columns):

  • make set by union all columns that match include_columns filter.

  • subtract columns that match exclude_columns filter.

Filtering expressions are identified by prefix:

  • ‘re:’ or empty - regular expression. Any valid python regular expression, e.g. “.*_DNA$”

  • ‘g:’ - columns’ group filter. Should completely match group name, e.g. “g:clinical_data”.

  • ‘$’ - keyword:
    • $features - all features (input columns, covariates).

    • $numeric_features - only numeric features.

    • $cat_features - only categorical features.

    • $target - target feature. (For survival problems will be two columns - time+event).

    • $all - all columns except key columns.

If no know prefix detected, the filter is considered as regular expression.

Show JSON schema
{
   "title": "BaseTransformerParams",
   "description": "Defines schema for transformer params.\n\nColumns inclusion/exclusion schema (see also `get_affected_columns`):\n\n- make set by union all columns that match `include_columns` filter.\n- subtract columns that match `exclude_columns` filter.\n\nFiltering expressions are identified by prefix:\n\n- 're:' or empty - regular expression. Any valid python regular expression, e.g. \".*_DNA$\"\n- 'g:' - columns' group filter. Should completely match group name, e.g. \"g:clinical_data\".\n- '$' - keyword:\n    - $features - all features (input columns, covariates).\n    - $numeric_features - only numeric features.\n    - $cat_features - only categorical features.\n    - $target - target feature. (For survival problems will be two columns - time+event).\n    - $all - all columns except key columns.\n\nIf no know prefix detected, the filter is considered as regular expression.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   }
}

Fields
field columns_to_include: List[str] = ['.*']

List of filtering expressions. By default, all columns are included.

field columns_to_exclude: List[str] = []

List of filtering expressions. Empty by default.

class logml.data.config.FillNaTransformerParams

Bases: logml.data.config.BaseTransformerParams

FillNaTransformer params

Show JSON schema
{
   "title": "FillNaTransformerParams",
   "description": "FillNaTransformer params",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "constant": {
         "title": "Constant",
         "description": "Value to replace NaN values with.",
         "anyOf": [
            {
               "type": "integer"
            },
            {
               "type": "number"
            },
            {
               "type": "string"
            }
         ]
      }
   },
   "required": [
      "constant"
   ]
}

Fields
field constant: Union[int, float, str] [Required]

Value to replace NaN values with.

class logml.data.config.BucketDefinition

Bases: pydantic.main.BaseModel

Defines a bucket for numerical values.

Bucket: (left_bound, right_bound].

NOTE: left and right bounds might be included/excluded if needed.

Show JSON schema
{
   "title": "BucketDefinition",
   "description": "Defines a bucket for numerical values.\n\nBucket: (left_bound, right_bound].\n\nNOTE: left and right bounds might be included/excluded if needed.",
   "type": "object",
   "properties": {
      "left_bound": {
         "title": "Left Bound",
         "description": "Defines a left bound for the bucket.",
         "default": NaN,
         "type": "number"
      },
      "right_bound": {
         "title": "Right Bound",
         "description": "Defines a right bound for the bucket.",
         "default": NaN,
         "type": "number"
      },
      "include_left_bound": {
         "title": "Include Left Bound",
         "description": "Whether to include left bound to bucket range.",
         "default": true,
         "type": "boolean"
      },
      "include_right_bound": {
         "title": "Include Right Bound",
         "description": "Whether to include right bound to bucket range.",
         "default": true,
         "type": "boolean"
      },
      "alias": {
         "title": "Alias",
         "description": "Defines an alias for the bucket.",
         "type": "string"
      }
   },
   "required": [
      "alias"
   ]
}

Fields
field left_bound: float = nan

Defines a left bound for the bucket.

field right_bound: float = nan

Defines a right bound for the bucket.

field include_left_bound: bool = True

Whether to include left bound to bucket range.

field include_right_bound: bool = True

Whether to include right bound to bucket range.

field alias: str [Required]

Defines an alias for the bucket.

class logml.data.config.BucketizeTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for ‘bucketize’ transformer params.

Show JSON schema
{
   "title": "BucketizeTransformerParams",
   "description": "Defines schema for 'bucketize' transformer params.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "suffix": {
         "title": "Suffix",
         "description": "Suffix that will be appended to the base column name for naming the result column.",
         "default": "__bucketized",
         "type": "string"
      },
      "buckets": {
         "title": "Buckets",
         "description": "Defines a list of buckets for transforming target column.",
         "default": [],
         "type": "array",
         "items": {
            "$ref": "#/definitions/BucketDefinition"
         }
      },
      "remove_base_columns": {
         "title": "Remove Base Columns",
         "description": "Whether base columns should be removed.",
         "default": true,
         "type": "boolean"
      }
   },
   "definitions": {
      "BucketDefinition": {
         "title": "BucketDefinition",
         "description": "Defines a bucket for numerical values.\n\nBucket: (left_bound, right_bound].\n\nNOTE: left and right bounds might be included/excluded if needed.",
         "type": "object",
         "properties": {
            "left_bound": {
               "title": "Left Bound",
               "description": "Defines a left bound for the bucket.",
               "default": NaN,
               "type": "number"
            },
            "right_bound": {
               "title": "Right Bound",
               "description": "Defines a right bound for the bucket.",
               "default": NaN,
               "type": "number"
            },
            "include_left_bound": {
               "title": "Include Left Bound",
               "description": "Whether to include left bound to bucket range.",
               "default": true,
               "type": "boolean"
            },
            "include_right_bound": {
               "title": "Include Right Bound",
               "description": "Whether to include right bound to bucket range.",
               "default": true,
               "type": "boolean"
            },
            "alias": {
               "title": "Alias",
               "description": "Defines an alias for the bucket.",
               "type": "string"
            }
         },
         "required": [
            "alias"
         ]
      }
   }
}

Fields
field suffix: str = '__bucketized'

Suffix that will be appended to the base column name for naming the result column.

field buckets: List[logml.data.config.BucketDefinition] = []

Defines a list of buckets for transforming target column.

field remove_base_columns: bool = True

Whether base columns should be removed.

class logml.data.config.DropColumnsTransformerParams

Bases: logml.data.config.BaseTransformerParams

Parameters for drop_columns transformer.

Show JSON schema
{
   "title": "DropColumnsTransformerParams",
   "description": "Parameters for `drop_columns` transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "dtypes_to_include": {
         "title": "Dtypes To Include",
         "description": "List of data types. Affected columns are additionally filtered to match these types. When empty, types filter is not applied. Higher level data kinds can be used (see py:ref:`DtypeKind`), such as \"i: for integer, \"f\" for float and so on.Most frequent options are `object`, `int64`, `float64`, `datetime64[ns]`.\n\nSee `https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes` for thelist of available standard pandas types.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   }
}

Fields
field dtypes_to_include: List[str] = []

List of data types. Affected columns are additionally filtered to match these types. When empty, types filter is not applied. Higher level data kinds can be used (see py:ref:DtypeKind), such as “i: for integer, “f” for float and so on.Most frequent options are object, int64, float64, datetime64[ns]. See https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes for thelist of available standard pandas types.

class logml.data.config.DecompositionTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for decomposition transformers (PCA, NMF).

Show JSON schema
{
   "title": "DecompositionTransformerParams",
   "description": "Defines schema for decomposition transformers (PCA, NMF).",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "prefix": {
         "title": "Prefix",
         "type": "string"
      }
   },
   "required": [
      "prefix"
   ]
}

Fields
field inner_params: Dict = {}
field prefix: str [Required]
class logml.data.config.EncodingTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for encoding transformers (one-hot, label, etc.).

Show JSON schema
{
   "title": "EncodingTransformerParams",
   "description": "Defines schema for encoding transformers (one-hot, label, etc.).",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      }
   }
}

Fields
field inner_params: Dict = {}
field scope: str = 'local'
class logml.data.config.MultiLabelOneHotTransformerParams

Bases: logml.data.config.EncodingTransformerParams

Defines schema for multilabel one-hot encoding transformer.

Show JSON schema
{
   "title": "MultiLabelOneHotTransformerParams",
   "description": "Defines schema for multilabel one-hot encoding transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      },
      "separator": {
         "title": "Separator",
         "default": ",",
         "type": "string"
      }
   }
}

Fields
field separator: str = ','
class logml.data.config.CategoricalsEncodingTransformerParams

Bases: logml.data.config.MultiLabelOneHotTransformerParams

Defines underlying encoder to use for categoricals.

Show JSON schema
{
   "title": "CategoricalsEncodingTransformerParams",
   "description": "Defines underlying encoder to use for categoricals.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      },
      "separator": {
         "title": "Separator",
         "default": ",",
         "type": "string"
      },
      "encoding": {
         "title": "Encoding",
         "type": "string"
      }
   },
   "required": [
      "encoding"
   ]
}

Fields
field encoding: str [Required]
class logml.data.config.MapEncodingTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for MapEncodingTransformer.

Show JSON schema
{
   "title": "MapEncodingTransformerParams",
   "description": "Defines schema for MapEncodingTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "mapping": {
         "title": "Mapping",
         "type": "object"
      },
      "unknown_values": {
         "title": "Unknown Values",
         "default": NaN,
         "anyOf": [
            {
               "type": "number"
            },
            {
               "type": "integer"
            },
            {
               "type": "string"
            }
         ]
      }
   },
   "required": [
      "mapping"
   ]
}

Fields
field mapping: Dict [Required]
field unknown_values: Union[float, int, str] = nan
class logml.data.config.FilteringTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for typical FilteringTransformer.

Show JSON schema
{
   "title": "FilteringTransformerParams",
   "description": "Defines schema for typical FilteringTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "type": "number"
      }
   },
   "required": [
      "threshold"
   ]
}

Fields
field threshold: float [Required]
class logml.data.config.PrevalenceFilteringTransformerParams

Bases: logml.data.config.BaseTransformerParams

Parameters for prevalence_filtering transformer.

See PrevalenceFilteringTransformer for details.

Show JSON schema
{
   "title": "PrevalenceFilteringTransformerParams",
   "description": "Parameters for `prevalence_filtering` transformer.\n\nSee `PrevalenceFilteringTransformer` for details.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "type": "number"
      },
      "values": {
         "title": "Values",
         "type": "array",
         "items": {}
      }
   },
   "required": [
      "threshold",
      "values"
   ]
}

Fields
field threshold: float [Required]
field values: List [Required]
class logml.data.config.MutationsFilteringTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for typical FilteringTransformer that uses mutations.

Show JSON schema
{
   "title": "MutationsFilteringTransformerParams",
   "description": "Defines schema for typical FilteringTransformer that uses mutations.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "mutations": {
         "title": "Mutations",
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   },
   "required": [
      "mutations"
   ]
}

Fields
field mutations: List[str] [Required]
class logml.data.config.MICETransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for MICE imputing transformer.

Show JSON schema
{
   "title": "MICETransformerParams",
   "description": "Defines schema for MICE imputing transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "random_state": {
         "title": "Random State",
         "type": "integer"
      },
      "n_nearest_features": {
         "title": "N Nearest Features",
         "default": 10,
         "type": "integer"
      },
      "max_iter": {
         "title": "Max Iter",
         "default": 20,
         "type": "integer"
      },
      "verbose": {
         "title": "Verbose",
         "default": 0,
         "type": "integer"
      },
      "sample_posterior": {
         "title": "Sample Posterior",
         "default": false,
         "type": "boolean"
      }
   }
}

Fields
field random_state: Optional[int] = None
class logml.data.config.ImputingTransformerParams

Bases: logml.data.config.EncodingTransformerParams

Defines underlying imputer to use for target columns.

Show JSON schema
{
   "title": "ImputingTransformerParams",
   "description": "Defines underlying imputer to use for target columns.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "inner_params": {
         "title": "Inner Params",
         "default": {},
         "type": "object"
      },
      "scope": {
         "title": "Scope",
         "default": "local",
         "type": "string"
      },
      "imputation": {
         "title": "Imputation",
         "type": "string"
      },
      "imputation_params": {
         "title": "Imputation Params",
         "default": {},
         "type": "object"
      }
   },
   "required": [
      "imputation"
   ]
}

Fields
field imputation: str [Required]
field imputation_params: Optional[dict] = {}
class logml.data.config.BinarizationLambdaTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for BinarizationLambdaTransformer.

Show JSON schema
{
   "title": "BinarizationLambdaTransformerParams",
   "description": "Defines schema for BinarizationLambdaTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "type": "number"
      }
   },
   "required": [
      "threshold"
   ]
}

Fields
field threshold: float [Required]
class logml.data.config.QueryBooleanTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for QueryBooleanTransformer.

Show JSON schema
{
   "title": "QueryBooleanTransformerParams",
   "description": "Defines schema for QueryBooleanTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "query": {
         "title": "Query",
         "type": "string"
      }
   },
   "required": [
      "query"
   ]
}

Fields
field query: str [Required]
class logml.data.config.NormalizationTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines underlying normalizer to use for target columns.

Show JSON schema
{
   "title": "NormalizationTransformerParams",
   "description": "Defines underlying normalizer to use for target columns.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "normalization": {
         "title": "Normalization",
         "type": "string"
      },
      "params": {
         "title": "Params",
         "default": {},
         "type": "object"
      }
   },
   "required": [
      "normalization"
   ]
}

Fields
field normalization: str [Required]
field params: dict = {}
class logml.data.config.AddRandomColumnsTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines schema for AddRandomColumnsTransformer.

Show JSON schema
{
   "title": "AddRandomColumnsTransformerParams",
   "description": "Defines schema for AddRandomColumnsTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "fraction": {
         "title": "Fraction",
         "type": "number"
      }
   },
   "required": [
      "fraction"
   ]
}

Fields
field fraction: float [Required]
class logml.data.config.DropNaMode(value)

Bases: str, enum.Enum

Specifies how to apply DropNA transformation.

all - when all columns are NA, any - when at least one is NA, threshold - when specified number or percentage is NA.

ALL = 'all'
ANY = 'any'
THRESHOLD = 'threshold'
class logml.data.config.DropNanRowsTransformerParams

Bases: logml.data.config.BaseTransformerParams

Configuration for drop_nan_rows transformer.

Show JSON schema
{
   "title": "DropNanRowsTransformerParams",
   "description": "Configuration for `drop_nan_rows` transformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "threshold": {
         "title": "Threshold",
         "default": 1.0,
         "exclusiveMinimum": 0.0,
         "help": "Determine >= threshold for count nan columns. If float from 0 to 1, defines ratio. If integer >= 1, then defines number columns.",
         "type": "number"
      },
      "how": {
         "default": "all",
         "help": "Determine if row is removed when we have at least one NA or all NA.\n            - `any` : If any NA values are present, drop that row.\n            - `all` : If all values are NA, drop that row.\n            - `threshold`: Use threshold to define ratio of NA values.\n        ",
         "allOf": [
            {
               "$ref": "#/definitions/DropNaMode"
            }
         ]
      }
   },
   "definitions": {
      "DropNaMode": {
         "title": "DropNaMode",
         "description": "Specifies how to apply DropNA transformation.\n\n`all` - when all columns are NA, `any` - when at least one is NA,\n`threshold` - when specified number or percentage is NA.",
         "enum": [
            "all",
            "any",
            "threshold"
         ],
         "type": "string"
      }
   }
}

Fields
field threshold: float = 1.0
Constraints
  • exclusiveMinimum = 0.0

  • help = Determine >= threshold for count nan columns. If float from 0 to 1, defines ratio. If integer >= 1, then defines number columns.

field how: logml.data.config.DropNaMode = DropNaMode.ALL
Constraints
  • help = Determine if row is removed when we have at least one NA or all NA. - any : If any NA values are present, drop that row. - all : If all values are NA, drop that row. - threshold: Use threshold to define ratio of NA values.

class logml.data.config.ResolveMultipleChoiceTransformerParams

Bases: logml.data.config.BaseTransformerParams

Defines parameters for ResolveMultipleChoiceTransformer.

Show JSON schema
{
   "title": "ResolveMultipleChoiceTransformerParams",
   "description": "Defines parameters for ResolveMultipleChoiceTransformer.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "keep_first_value": {
         "title": "Keep First Value",
         "default": true,
         "type": "boolean"
      },
      "delimeter": {
         "title": "Delimeter",
         "default": ",",
         "type": "string"
      }
   }
}

Fields
field keep_first_value: bool = True
field delimeter: str = ','
class logml.data.config.RemoveCorrelatedColumnsParams

Bases: logml.data.config.BaseTransformerParams

Defines thresholds that will be used for Correlated columns removal.

Show JSON schema
{
   "title": "RemoveCorrelatedColumnsParams",
   "description": "Defines thresholds that will be used for Correlated columns removal.",
   "type": "object",
   "properties": {
      "columns_to_include": {
         "title": "Columns To Include",
         "description": "List of filtering expressions. By default, all columns are included.",
         "default": [
            ".*"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "columns_to_exclude": {
         "title": "Columns To Exclude",
         "description": "List of filtering expressions. Empty by default.",
         "default": [],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "correlation_type": {
         "description": "Type of correlation that will be used for removing correlated features.",
         "default": "spearman",
         "allOf": [
            {
               "$ref": "#/definitions/CorrelationType"
            }
         ]
      },
      "correlation_threshold": {
         "title": "Correlation Threshold",
         "description": "Defines a correlation threshold that will be used to identify \"correlated\" features.",
         "default": 0.9,
         "type": "number"
      },
      "correlation_min_samples_fraction": {
         "title": "Correlation Min Samples Fraction",
         "description": "Additional parameter that defines the minimum fraction of samples that is required to calculate\n            correlation coefficient between two columns. As NaNs are ignored and correlation coefficient is calculated\n            on top of non-NaN subset of rows for a pair of columns - this parameter could help to make the results\n            more meaningful. Please see the reference of \"min_periods\" here:\n            https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html\n            ",
         "default": 0.3,
         "type": "number"
      },
      "correlation_group_level_cutoff": {
         "title": "Correlation Group Level Cutoff",
         "description": "Sets cutoff for how many levels of neighbours to consider when building correlation groups.\n\n        For example consider the following correlation matrix:\n\n        .. code-block::\n\n                    a    b    c    d\n                a  1.0  0.8  0.8  0.7\n                b  0.8  1.0    0    0\n                c  0.8    0  1.0  0.8\n                d  0.7    0  0.8  1.0\n\n        Let's say, we use threshold as ``> 0.7``. In this case `a` is correlated strongly with `b` and `c`, and \n        `c` correlated with `d`.\n\n        When we set cutoff to `1`, we use direct neighbours only, so there is one group `'a', 'c', 'b'`. \n        In this case `d` is not included, because the group has been already formed around `a` column.\n\n        If we set it to `-1` or anything more than 1, we use all reachable neighbours. In this case, correlation \n        group is formed as ``'a', 'c', 'b', 'd'`` due to fact that `d` is strongly correlated with `c`, disregarding \n        it weak connection to `a`. As you can see, it will result in larger groups, and possibility to assign to the \n        same group columns with correlation less than a threshold. It could reflect cross-correlation more\n        naturally in some cases.\n        ",
         "default": 1,
         "type": "integer"
      },
      "correlation_key_names": {
         "title": "Correlation Key Names",
         "description": "Defines a list of biologically rational gene names (subst) that\n            will be used for correlation groups naming. In case some of those names will appear in one of column names\n            within the same correlation group - the result correlation group identifier will contain those names.",
         "default": [
            "TP53",
            "KRAS",
            "CDKN2A",
            "CDKN2B",
            "PIK3CA",
            "ATM",
            "BRCA1",
            "SOX2",
            "GNAS2",
            "TERC",
            "STK11",
            "PDCD1",
            "LAG3",
            "TIGIT",
            "HAVCR2",
            "EOMES",
            "MTAP"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      }
   },
   "definitions": {
      "CorrelationType": {
         "title": "CorrelationType",
         "description": "Defines available correlation types.",
         "enum": [
            "pearson",
            "spearman"
         ],
         "type": "string"
      }
   }
}

Fields
field correlation_type: logml.configuration.eda.CorrelationType = CorrelationType.SPEARMAN

Type of correlation that will be used for removing correlated features.

field correlation_threshold: float = 0.9

Defines a correlation threshold that will be used to identify “correlated” features.

field correlation_min_samples_fraction: float = 0.3

Additional parameter that defines the minimum fraction of samples that is required to calculate correlation coefficient between two columns. As NaNs are ignored and correlation coefficient is calculated on top of non-NaN subset of rows for a pair of columns - this parameter could help to make the results more meaningful. Please see the reference of “min_periods” here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html

field correlation_group_level_cutoff: int = 1

Sets cutoff for how many levels of neighbours to consider when building correlation groups. For example consider the following correlation matrix: .. code-block:: a b c d a 1.0 0.8 0.8 0.7 b 0.8 1.0 0 0 c 0.8 0 1.0 0.8 d 0.7 0 0.8 1.0 Let’s say, we use threshold as > 0.7. In this case a is correlated strongly with b and c, and c correlated with d. When we set cutoff to 1, we use direct neighbours only, so there is one group ‘a’, ‘c’, ‘b’. In this case d is not included, because the group has been already formed around a column. If we set it to -1 or anything more than 1, we use all reachable neighbours. In this case, correlation group is formed as 'a', 'c', 'b', 'd' due to fact that d is strongly correlated with c, disregarding it weak connection to a. As you can see, it will result in larger groups, and possibility to assign to the same group columns with correlation less than a threshold. It could reflect cross-correlation more naturally in some cases.

field correlation_key_names: List[str] = ['TP53', 'KRAS', 'CDKN2A', 'CDKN2B', 'PIK3CA', 'ATM', 'BRCA1', 'SOX2', 'GNAS2', 'TERC', 'STK11', 'PDCD1', 'LAG3', 'TIGIT', 'HAVCR2', 'EOMES', 'MTAP']

Defines a list of biologically rational gene names (subst) that will be used for correlation groups naming. In case some of those names will appear in one of column names within the same correlation group - the result correlation group identifier will contain those names.