logml.data.config
Classes
|
Specifies how to apply DropNA transformation. |
- class logml.data.config.BaseTransformerParams
Bases:
pydantic.main.BaseModel
Defines schema for transformer params.
Columns inclusion/exclusion schema (see also get_affected_columns):
make set by union all columns that match include_columns filter.
subtract columns that match exclude_columns filter.
Filtering expressions are identified by prefix:
‘re:’ or empty - regular expression. Any valid python regular expression, e.g. “.*_DNA$”
‘g:’ - columns’ group filter. Should completely match group name, e.g. “g:clinical_data”.
- ‘$’ - keyword:
$features - all features (input columns, covariates).
$numeric_features - only numeric features.
$cat_features - only categorical features.
$target - target feature. (For survival problems will be two columns - time+event).
$all - all columns except key columns.
If no know prefix detected, the filter is considered as regular expression.
Show JSON schema
{ "title": "BaseTransformerParams", "description": "Defines schema for transformer params.\n\nColumns inclusion/exclusion schema (see also `get_affected_columns`):\n\n- make set by union all columns that match `include_columns` filter.\n- subtract columns that match `exclude_columns` filter.\n\nFiltering expressions are identified by prefix:\n\n- 're:' or empty - regular expression. Any valid python regular expression, e.g. \".*_DNA$\"\n- 'g:' - columns' group filter. Should completely match group name, e.g. \"g:clinical_data\".\n- '$' - keyword:\n - $features - all features (input columns, covariates).\n - $numeric_features - only numeric features.\n - $cat_features - only categorical features.\n - $target - target feature. (For survival problems will be two columns - time+event).\n - $all - all columns except key columns.\n\nIf no know prefix detected, the filter is considered as regular expression.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } } } }
- field columns_to_include: List[str] = ['.*']
List of filtering expressions. By default, all columns are included.
- field columns_to_exclude: List[str] = []
List of filtering expressions. Empty by default.
- class logml.data.config.FillNaTransformerParams
Bases:
logml.data.config.BaseTransformerParams
FillNaTransformer params
Show JSON schema
{ "title": "FillNaTransformerParams", "description": "FillNaTransformer params", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "constant": { "title": "Constant", "description": "Value to replace NaN values with.", "anyOf": [ { "type": "integer" }, { "type": "number" }, { "type": "string" } ] } }, "required": [ "constant" ] }
- field constant: Union[int, float, str] [Required]
Value to replace NaN values with.
- class logml.data.config.BucketDefinition
Bases:
pydantic.main.BaseModel
Defines a bucket for numerical values.
Bucket: (left_bound, right_bound].
NOTE: left and right bounds might be included/excluded if needed.
Show JSON schema
{ "title": "BucketDefinition", "description": "Defines a bucket for numerical values.\n\nBucket: (left_bound, right_bound].\n\nNOTE: left and right bounds might be included/excluded if needed.", "type": "object", "properties": { "left_bound": { "title": "Left Bound", "description": "Defines a left bound for the bucket.", "default": NaN, "type": "number" }, "right_bound": { "title": "Right Bound", "description": "Defines a right bound for the bucket.", "default": NaN, "type": "number" }, "include_left_bound": { "title": "Include Left Bound", "description": "Whether to include left bound to bucket range.", "default": true, "type": "boolean" }, "include_right_bound": { "title": "Include Right Bound", "description": "Whether to include right bound to bucket range.", "default": true, "type": "boolean" }, "alias": { "title": "Alias", "description": "Defines an alias for the bucket.", "type": "string" } }, "required": [ "alias" ] }
- Fields
- field left_bound: float = nan
Defines a left bound for the bucket.
- field right_bound: float = nan
Defines a right bound for the bucket.
- field include_left_bound: bool = True
Whether to include left bound to bucket range.
- field include_right_bound: bool = True
Whether to include right bound to bucket range.
- field alias: str [Required]
Defines an alias for the bucket.
- class logml.data.config.BucketizeTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for ‘bucketize’ transformer params.
Show JSON schema
{ "title": "BucketizeTransformerParams", "description": "Defines schema for 'bucketize' transformer params.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "suffix": { "title": "Suffix", "description": "Suffix that will be appended to the base column name for naming the result column.", "default": "__bucketized", "type": "string" }, "buckets": { "title": "Buckets", "description": "Defines a list of buckets for transforming target column.", "default": [], "type": "array", "items": { "$ref": "#/definitions/BucketDefinition" } }, "remove_base_columns": { "title": "Remove Base Columns", "description": "Whether base columns should be removed.", "default": true, "type": "boolean" } }, "definitions": { "BucketDefinition": { "title": "BucketDefinition", "description": "Defines a bucket for numerical values.\n\nBucket: (left_bound, right_bound].\n\nNOTE: left and right bounds might be included/excluded if needed.", "type": "object", "properties": { "left_bound": { "title": "Left Bound", "description": "Defines a left bound for the bucket.", "default": NaN, "type": "number" }, "right_bound": { "title": "Right Bound", "description": "Defines a right bound for the bucket.", "default": NaN, "type": "number" }, "include_left_bound": { "title": "Include Left Bound", "description": "Whether to include left bound to bucket range.", "default": true, "type": "boolean" }, "include_right_bound": { "title": "Include Right Bound", "description": "Whether to include right bound to bucket range.", "default": true, "type": "boolean" }, "alias": { "title": "Alias", "description": "Defines an alias for the bucket.", "type": "string" } }, "required": [ "alias" ] } } }
- field suffix: str = '__bucketized'
Suffix that will be appended to the base column name for naming the result column.
- field buckets: List[logml.data.config.BucketDefinition] = []
Defines a list of buckets for transforming target column.
- field remove_base_columns: bool = True
Whether base columns should be removed.
- class logml.data.config.DropColumnsTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Parameters for drop_columns transformer.
Show JSON schema
{ "title": "DropColumnsTransformerParams", "description": "Parameters for `drop_columns` transformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "dtypes_to_include": { "title": "Dtypes To Include", "description": "List of data types. Affected columns are additionally filtered to match these types. When empty, types filter is not applied. Higher level data kinds can be used (see py:ref:`DtypeKind`), such as \"i: for integer, \"f\" for float and so on.Most frequent options are `object`, `int64`, `float64`, `datetime64[ns]`.\n\nSee `https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes` for thelist of available standard pandas types.", "default": [], "type": "array", "items": { "type": "string" } } } }
- field dtypes_to_include: List[str] = []
List of data types. Affected columns are additionally filtered to match these types. When empty, types filter is not applied. Higher level data kinds can be used (see py:ref:DtypeKind), such as “i: for integer, “f” for float and so on.Most frequent options are object, int64, float64, datetime64[ns]. See https://pandas.pydata.org/docs/user_guide/basics.html#basics-dtypes for thelist of available standard pandas types.
- class logml.data.config.DecompositionTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for decomposition transformers (PCA, NMF).
Show JSON schema
{ "title": "DecompositionTransformerParams", "description": "Defines schema for decomposition transformers (PCA, NMF).", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "inner_params": { "title": "Inner Params", "default": {}, "type": "object" }, "prefix": { "title": "Prefix", "type": "string" } }, "required": [ "prefix" ] }
- Fields
- field inner_params: Dict = {}
- field prefix: str [Required]
- class logml.data.config.EncodingTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for encoding transformers (one-hot, label, etc.).
Show JSON schema
{ "title": "EncodingTransformerParams", "description": "Defines schema for encoding transformers (one-hot, label, etc.).", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "inner_params": { "title": "Inner Params", "default": {}, "type": "object" }, "scope": { "title": "Scope", "default": "local", "type": "string" } } }
- Fields
- field inner_params: Dict = {}
- field scope: str = 'local'
- class logml.data.config.MultiLabelOneHotTransformerParams
Bases:
logml.data.config.EncodingTransformerParams
Defines schema for multilabel one-hot encoding transformer.
Show JSON schema
{ "title": "MultiLabelOneHotTransformerParams", "description": "Defines schema for multilabel one-hot encoding transformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "inner_params": { "title": "Inner Params", "default": {}, "type": "object" }, "scope": { "title": "Scope", "default": "local", "type": "string" }, "separator": { "title": "Separator", "default": ",", "type": "string" } } }
- Fields
- field separator: str = ','
- class logml.data.config.CategoricalsEncodingTransformerParams
Bases:
logml.data.config.MultiLabelOneHotTransformerParams
Defines underlying encoder to use for categoricals.
Show JSON schema
{ "title": "CategoricalsEncodingTransformerParams", "description": "Defines underlying encoder to use for categoricals.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "inner_params": { "title": "Inner Params", "default": {}, "type": "object" }, "scope": { "title": "Scope", "default": "local", "type": "string" }, "separator": { "title": "Separator", "default": ",", "type": "string" }, "encoding": { "title": "Encoding", "type": "string" } }, "required": [ "encoding" ] }
- Fields
- field encoding: str [Required]
- class logml.data.config.MapEncodingTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for MapEncodingTransformer.
Show JSON schema
{ "title": "MapEncodingTransformerParams", "description": "Defines schema for MapEncodingTransformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "mapping": { "title": "Mapping", "type": "object" }, "unknown_values": { "title": "Unknown Values", "default": NaN, "anyOf": [ { "type": "number" }, { "type": "integer" }, { "type": "string" } ] } }, "required": [ "mapping" ] }
- field mapping: Dict [Required]
- field unknown_values: Union[float, int, str] = nan
- class logml.data.config.FilteringTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for typical FilteringTransformer.
Show JSON schema
{ "title": "FilteringTransformerParams", "description": "Defines schema for typical FilteringTransformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "threshold": { "title": "Threshold", "type": "number" } }, "required": [ "threshold" ] }
- Fields
- field threshold: float [Required]
- class logml.data.config.PrevalenceFilteringTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Parameters for prevalence_filtering transformer.
See PrevalenceFilteringTransformer for details.
Show JSON schema
{ "title": "PrevalenceFilteringTransformerParams", "description": "Parameters for `prevalence_filtering` transformer.\n\nSee `PrevalenceFilteringTransformer` for details.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "threshold": { "title": "Threshold", "type": "number" }, "values": { "title": "Values", "type": "array", "items": {} } }, "required": [ "threshold", "values" ] }
- Fields
- field threshold: float [Required]
- field values: List [Required]
- class logml.data.config.MutationsFilteringTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for typical FilteringTransformer that uses mutations.
Show JSON schema
{ "title": "MutationsFilteringTransformerParams", "description": "Defines schema for typical FilteringTransformer that uses mutations.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "mutations": { "title": "Mutations", "type": "array", "items": { "type": "string" } } }, "required": [ "mutations" ] }
- Fields
- field mutations: List[str] [Required]
- class logml.data.config.MICETransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for MICE imputing transformer.
Show JSON schema
{ "title": "MICETransformerParams", "description": "Defines schema for MICE imputing transformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "random_state": { "title": "Random State", "type": "integer" }, "n_nearest_features": { "title": "N Nearest Features", "default": 10, "type": "integer" }, "max_iter": { "title": "Max Iter", "default": 20, "type": "integer" }, "verbose": { "title": "Verbose", "default": 0, "type": "integer" }, "sample_posterior": { "title": "Sample Posterior", "default": false, "type": "boolean" } } }
- Fields
- field random_state: Optional[int] = None
- class logml.data.config.ImputingTransformerParams
Bases:
logml.data.config.EncodingTransformerParams
Defines underlying imputer to use for target columns.
Show JSON schema
{ "title": "ImputingTransformerParams", "description": "Defines underlying imputer to use for target columns.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "inner_params": { "title": "Inner Params", "default": {}, "type": "object" }, "scope": { "title": "Scope", "default": "local", "type": "string" }, "imputation": { "title": "Imputation", "type": "string" }, "imputation_params": { "title": "Imputation Params", "default": {}, "type": "object" } }, "required": [ "imputation" ] }
- field imputation: str [Required]
- field imputation_params: Optional[dict] = {}
- class logml.data.config.BinarizationLambdaTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for BinarizationLambdaTransformer.
Show JSON schema
{ "title": "BinarizationLambdaTransformerParams", "description": "Defines schema for BinarizationLambdaTransformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "threshold": { "title": "Threshold", "type": "number" } }, "required": [ "threshold" ] }
- Fields
- field threshold: float [Required]
- class logml.data.config.QueryBooleanTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for QueryBooleanTransformer.
Show JSON schema
{ "title": "QueryBooleanTransformerParams", "description": "Defines schema for QueryBooleanTransformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "query": { "title": "Query", "type": "string" } }, "required": [ "query" ] }
- Fields
- field query: str [Required]
- class logml.data.config.NormalizationTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines underlying normalizer to use for target columns.
Show JSON schema
{ "title": "NormalizationTransformerParams", "description": "Defines underlying normalizer to use for target columns.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "normalization": { "title": "Normalization", "type": "string" }, "params": { "title": "Params", "default": {}, "type": "object" } }, "required": [ "normalization" ] }
- Fields
- field normalization: str [Required]
- field params: dict = {}
- class logml.data.config.AddRandomColumnsTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines schema for AddRandomColumnsTransformer.
Show JSON schema
{ "title": "AddRandomColumnsTransformerParams", "description": "Defines schema for AddRandomColumnsTransformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "fraction": { "title": "Fraction", "type": "number" } }, "required": [ "fraction" ] }
- Fields
- field fraction: float [Required]
- class logml.data.config.DropNaMode(value)
Bases:
str
,enum.Enum
Specifies how to apply DropNA transformation.
all - when all columns are NA, any - when at least one is NA, threshold - when specified number or percentage is NA.
- ALL = 'all'
- ANY = 'any'
- THRESHOLD = 'threshold'
- class logml.data.config.DropNanRowsTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Configuration for drop_nan_rows transformer.
Show JSON schema
{ "title": "DropNanRowsTransformerParams", "description": "Configuration for `drop_nan_rows` transformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "threshold": { "title": "Threshold", "default": 1.0, "exclusiveMinimum": 0.0, "help": "Determine >= threshold for count nan columns. If float from 0 to 1, defines ratio. If integer >= 1, then defines number columns.", "type": "number" }, "how": { "default": "all", "help": "Determine if row is removed when we have at least one NA or all NA.\n - `any` : If any NA values are present, drop that row.\n - `all` : If all values are NA, drop that row.\n - `threshold`: Use threshold to define ratio of NA values.\n ", "allOf": [ { "$ref": "#/definitions/DropNaMode" } ] } }, "definitions": { "DropNaMode": { "title": "DropNaMode", "description": "Specifies how to apply DropNA transformation.\n\n`all` - when all columns are NA, `any` - when at least one is NA,\n`threshold` - when specified number or percentage is NA.", "enum": [ "all", "any", "threshold" ], "type": "string" } } }
- field threshold: float = 1.0
- Constraints
exclusiveMinimum = 0.0
help = Determine >= threshold for count nan columns. If float from 0 to 1, defines ratio. If integer >= 1, then defines number columns.
- field how: logml.data.config.DropNaMode = DropNaMode.ALL
- Constraints
help = Determine if row is removed when we have at least one NA or all NA. - any : If any NA values are present, drop that row. - all : If all values are NA, drop that row. - threshold: Use threshold to define ratio of NA values.
- class logml.data.config.ResolveMultipleChoiceTransformerParams
Bases:
logml.data.config.BaseTransformerParams
Defines parameters for ResolveMultipleChoiceTransformer.
Show JSON schema
{ "title": "ResolveMultipleChoiceTransformerParams", "description": "Defines parameters for ResolveMultipleChoiceTransformer.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "keep_first_value": { "title": "Keep First Value", "default": true, "type": "boolean" }, "delimeter": { "title": "Delimeter", "default": ",", "type": "string" } } }
- field keep_first_value: bool = True
- field delimeter: str = ','
Bases:
logml.data.config.BaseTransformerParams
Defines thresholds that will be used for Correlated columns removal.
Show JSON schema
{ "title": "RemoveCorrelatedColumnsParams", "description": "Defines thresholds that will be used for Correlated columns removal.", "type": "object", "properties": { "columns_to_include": { "title": "Columns To Include", "description": "List of filtering expressions. By default, all columns are included.", "default": [ ".*" ], "type": "array", "items": { "type": "string" } }, "columns_to_exclude": { "title": "Columns To Exclude", "description": "List of filtering expressions. Empty by default.", "default": [], "type": "array", "items": { "type": "string" } }, "correlation_type": { "description": "Type of correlation that will be used for removing correlated features.", "default": "spearman", "allOf": [ { "$ref": "#/definitions/CorrelationType" } ] }, "correlation_threshold": { "title": "Correlation Threshold", "description": "Defines a correlation threshold that will be used to identify \"correlated\" features.", "default": 0.9, "type": "number" }, "correlation_min_samples_fraction": { "title": "Correlation Min Samples Fraction", "description": "Additional parameter that defines the minimum fraction of samples that is required to calculate\n correlation coefficient between two columns. As NaNs are ignored and correlation coefficient is calculated\n on top of non-NaN subset of rows for a pair of columns - this parameter could help to make the results\n more meaningful. Please see the reference of \"min_periods\" here:\n https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html\n ", "default": 0.3, "type": "number" }, "correlation_group_level_cutoff": { "title": "Correlation Group Level Cutoff", "description": "Sets cutoff for how many levels of neighbours to consider when building correlation groups.\n\n For example consider the following correlation matrix:\n\n .. code-block::\n\n a b c d\n a 1.0 0.8 0.8 0.7\n b 0.8 1.0 0 0\n c 0.8 0 1.0 0.8\n d 0.7 0 0.8 1.0\n\n Let's say, we use threshold as ``> 0.7``. In this case `a` is correlated strongly with `b` and `c`, and \n `c` correlated with `d`.\n\n When we set cutoff to `1`, we use direct neighbours only, so there is one group `'a', 'c', 'b'`. \n In this case `d` is not included, because the group has been already formed around `a` column.\n\n If we set it to `-1` or anything more than 1, we use all reachable neighbours. In this case, correlation \n group is formed as ``'a', 'c', 'b', 'd'`` due to fact that `d` is strongly correlated with `c`, disregarding \n it weak connection to `a`. As you can see, it will result in larger groups, and possibility to assign to the \n same group columns with correlation less than a threshold. It could reflect cross-correlation more\n naturally in some cases.\n ", "default": 1, "type": "integer" }, "correlation_key_names": { "title": "Correlation Key Names", "description": "Defines a list of biologically rational gene names (subst) that\n will be used for correlation groups naming. In case some of those names will appear in one of column names\n within the same correlation group - the result correlation group identifier will contain those names.", "default": [ "TP53", "KRAS", "CDKN2A", "CDKN2B", "PIK3CA", "ATM", "BRCA1", "SOX2", "GNAS2", "TERC", "STK11", "PDCD1", "LAG3", "TIGIT", "HAVCR2", "EOMES", "MTAP" ], "type": "array", "items": { "type": "string" } } }, "definitions": { "CorrelationType": { "title": "CorrelationType", "description": "Defines available correlation types.", "enum": [ "pearson", "spearman" ], "type": "string" } } }
- Fields
Type of correlation that will be used for removing correlated features.
Defines a correlation threshold that will be used to identify “correlated” features.
Additional parameter that defines the minimum fraction of samples that is required to calculate correlation coefficient between two columns. As NaNs are ignored and correlation coefficient is calculated on top of non-NaN subset of rows for a pair of columns - this parameter could help to make the results more meaningful. Please see the reference of “min_periods” here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.corr.html
Sets cutoff for how many levels of neighbours to consider when building correlation groups. For example consider the following correlation matrix: .. code-block:: a b c d a 1.0 0.8 0.8 0.7 b 0.8 1.0 0 0 c 0.8 0 1.0 0.8 d 0.7 0 0.8 1.0 Let’s say, we use threshold as
> 0.7
. In this case a is correlated strongly with b and c, and c correlated with d. When we set cutoff to 1, we use direct neighbours only, so there is one group ‘a’, ‘c’, ‘b’. In this case d is not included, because the group has been already formed around a column. If we set it to -1 or anything more than 1, we use all reachable neighbours. In this case, correlation group is formed as'a', 'c', 'b', 'd'
due to fact that d is strongly correlated with c, disregarding it weak connection to a. As you can see, it will result in larger groups, and possibility to assign to the same group columns with correlation less than a threshold. It could reflect cross-correlation more naturally in some cases.
Defines a list of biologically rational gene names (subst) that will be used for correlation groups naming. In case some of those names will appear in one of column names within the same correlation group - the result correlation group identifier will contain those names.