logml.analysis.items.modeling

ML Analysis and related routines

Functions

get_stratum(global_cfg, stratum_id)

Returns a corresponding Strata for a given stratum id.

stratum_high_resources_estimator(res[, ...])

Default estimation of high resources requirements based on dataset rows and columns number.

Classes

AggregateFeatureImportanceStep(*args, **kwargs)

FI Aggregation Step.

EdaStep(*args, **kwargs)

EDA analysis

ExtractFeatureImportanceStep(*args, **kwargs)

FI extraction step.

GenerateReportStep(*args, **kwargs)

Generate Report Step.

ModelingAnalysisResult([selected_models, ...])

Modeling analysis result.

ModelingItemBase(*args, **kwargs)

Wraps analysis algo into data loading/saving and config.

ModelingTransformer(*args, **kwargs)

Wraps analysis algo into data loading/saving and config.

PackageReportStep(*args, **kwargs)

Package report step

SelectModelsStep(*args, **kwargs)

Models selection step.

SurvivalAnalysisResult([stratum_id, ...])

Survival step result

SurvivalAnalysisStep(*args, **kwargs)

Survival analysis step.

TrainModelStep(*args, **kwargs)

Train Model Step.

logml.analysis.items.modeling.get_stratum(global_cfg, stratum_id: str) Optional[logml.configuration.stratification.Strata]

Returns a corresponding Strata for a given stratum id.

class logml.analysis.items.modeling.ModelingConfigBase

Bases: pydantic.main.BaseModel

Config with common modeling params

Show JSON schema
{
   "title": "ModelingConfigBase",
   "description": "Config with common modeling params",
   "type": "object",
   "properties": {
      "stratum_id": {
         "title": "Stratum Id",
         "type": "string"
      },
      "problem_id": {
         "title": "Problem Id",
         "type": "string"
      }
   },
   "required": [
      "stratum_id"
   ]
}

Fields
field stratum_id: str [Required]
field problem_id: Optional[str] [Required]
class logml.analysis.items.modeling.ModelingItemBase(*args, **kwargs)

Bases: logml.analysis.base_item.AnalysisItem

Wraps analysis algo into data loading/saving and config.

make_params() dict

Make global params for modeling.

abstract run()

See parent description.

abstract get_result()

See parent description.

class logml.analysis.items.modeling.EdaStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

EDA analysis

LABEL = 'run_eda'
PARAMS_CLS

alias of logml.analysis.items.modeling.ModelingConfigBase

run()

Run modeling step

get_result()

Modeling saves results directly, nothing to return here.

classmethod estimate_resources(res: logml.analysis.common.JobResourcesReqs, cfg: GlobalConfig = None, df: Optional[pandas.core.frame.DataFrame] = None, strata_shapes: Optional[Dict[str, tuple]] = None, item_params: Any = None) None

See parent description

get_paths_to_release() Optional[List[logml.analysis.base_item.ReleasePath]]

See parent description.

class logml.analysis.items.modeling.ModelingTransformerStepConfig

Bases: logml.analysis.items.modeling.ModelingConfigBase

Config for modelig dataset generation.

Show JSON schema
{
   "title": "ModelingTransformerStepConfig",
   "description": "Config for modelig dataset generation.",
   "type": "object",
   "properties": {
      "stratum_id": {
         "title": "Stratum Id",
         "type": "string"
      },
      "problem_id": {
         "title": "Problem Id",
         "type": "string"
      },
      "n_dataset": {
         "title": "N Dataset",
         "type": "integer"
      }
   },
   "required": [
      "stratum_id",
      "n_dataset"
   ]
}

Fields
field n_dataset: int [Required]
class logml.analysis.items.modeling.ModelingTransformer(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

Wraps analysis algo into data loading/saving and config.

LABEL = 'modeling_data_transform'
PARAMS_CLS

alias of logml.analysis.items.modeling.ModelingTransformerStepConfig

run()

Run modeling step

get_result()

Modeling saves results directly, nothing to return here.

classmethod estimate_resources(res: logml.analysis.common.JobResourcesReqs, cfg: GlobalConfig = None, df: Optional[pandas.core.frame.DataFrame] = None, strata_shapes: Optional[Dict[str, tuple]] = None, item_params: Any = None) None

See parent description

class logml.analysis.items.modeling.TrainModelStepConfig

Bases: logml.analysis.items.modeling.ModelingConfigBase

Config for model training.

Show JSON schema
{
   "title": "TrainModelStepConfig",
   "description": "Config for model training.",
   "type": "object",
   "properties": {
      "stratum_id": {
         "title": "Stratum Id",
         "type": "string"
      },
      "problem_id": {
         "title": "Problem Id",
         "type": "string"
      },
      "model_name": {
         "title": "Model Name",
         "type": "string"
      }
   },
   "required": [
      "stratum_id",
      "model_name"
   ]
}

Fields
field model_name: str [Required]
class logml.analysis.items.modeling.TrainModelStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

Train Model Step.

LABEL = 'train_model'
PARAMS_CLS

alias of logml.analysis.items.modeling.TrainModelStepConfig

run()

Run modeling step

get_result()

Modeling saves results directly, nothing to return here.

classmethod estimate_resources(res: logml.analysis.common.JobResourcesReqs, cfg: GlobalConfig = None, df: Optional[pandas.core.frame.DataFrame] = None, strata_shapes: Optional[Dict[str, tuple]] = None, item_params: Any = None) None

See parent description

class logml.analysis.items.modeling.SelectModelsStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

Models selection step.

LABEL = 'select_models'
PARAMS_CLS

alias of logml.analysis.items.modeling.ModelingConfigBase

run()

Run modeling step

get_result()

Modeling saves results directly, nothing to return here.

class logml.analysis.items.modeling.ExtractFeatureImportanceStepConfig

Bases: logml.analysis.items.modeling.ModelingConfigBase

Parameters for FI extraction.

Show JSON schema
{
   "title": "ExtractFeatureImportanceStepConfig",
   "description": "Parameters for FI extraction.",
   "type": "object",
   "properties": {
      "stratum_id": {
         "title": "Stratum Id",
         "type": "string"
      },
      "problem_id": {
         "title": "Problem Id",
         "type": "string"
      },
      "dataset_n": {
         "title": "Dataset N",
         "type": "integer"
      },
      "model_name": {
         "title": "Model Name",
         "type": "string"
      }
   },
   "required": [
      "stratum_id",
      "dataset_n",
      "model_name"
   ]
}

Fields
field dataset_n: int [Required]
field model_name: str [Required]
class logml.analysis.items.modeling.ExtractFeatureImportanceStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

FI extraction step.

LABEL = 'extract_fi'
PARAMS_CLS

alias of logml.analysis.items.modeling.ExtractFeatureImportanceStepConfig

run()

Run features selection step

get_result()

Modeling saves results directly, nothing to return here.

classmethod estimate_resources(res: logml.analysis.common.JobResourcesReqs, cfg: GlobalConfig = None, df: Optional[pandas.core.frame.DataFrame] = None, strata_shapes: Optional[Dict[str, tuple]] = None, item_params: Any = None) None

See parent description.

class logml.analysis.items.modeling.ModelingAnalysisResult(selected_models: Optional[List[logml.model_search.common.ModelEvaluationData]] = None, fi_models: Optional[List[str]] = None, objective: Optional[logml.configuration.modeling.ModelingTaskSpec] = None, stratum_id: Optional[str] = None, problem_id: Optional[str] = None, num_features: int = - 1, top_features: Optional[List[str]] = None, analysis_name: Optional[str] = None, unique_name: Optional[str] = None, status: Optional[str] = None, fi_relative_loss: Optional[Dict[str, float]] = None)

Bases: logml.analysis.base_item.AnalysisResult

Modeling analysis result.

class logml.analysis.items.modeling.AggregateFeatureImportanceStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

FI Aggregation Step.

Note that this step is completely optional: in any case, all results are required. Here we only generate some sort of short summary table which is available even if reporting step fails fo some reason.

LABEL = 'combine_fi'
PARAMS_CLS

alias of logml.analysis.items.modeling.ModelingConfigBase

run()

See parent description.

generate_analysis_result(fi_runner, max_top_features_rank=30)

Generates (high-level) modeling analysis result

get_result()

Modeling saves results directly, nothing to return here.

generate_analysis_metadata() Optional[logml.analysis.base_item.AnalysisMetadata]

Creates metadata object.

Major use for metadata object is to point to high-level analysis results (for example, ultimate result of Survival Modeling, as opposed to sub-level analysis step, like model search). Those results are then gathered and rendered at the summary report page.

get_paths_to_release() Optional[List[logml.analysis.base_item.ReleasePath]]

See parent description.

class logml.analysis.items.modeling.SurvivalAnalysisResult(stratum_id: Optional[str] = None, problem_id: Optional[str] = None, num_features: int = - 1, top_features: Optional[List[str]] = None, analysis_name: Optional[str] = None, unique_name: Optional[str] = None, status: Optional[str] = None)

Bases: logml.analysis.base_item.AnalysisResult

Survival step result

class logml.analysis.items.modeling.SurvivalAnalysisStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

Survival analysis step.

LABEL = 'survival_analysis'
PARAMS_CLS

alias of logml.analysis.items.modeling.ModelingConfigBase

run()

Run modeling step

generate_analysis_result()

Generate artifacts.

get_result()

Modeling saves results directly, nothing to return here.

generate_analysis_metadata() Optional[logml.analysis.base_item.AnalysisMetadata]

Creates metadata object.

Major use for metadata object is to point to high-level analysis results (for example, ultimate result of Survival Modeling, as opposed to sub-level analysis step, like model search). Those results are then gathered and rendered at the summary report page.

class logml.analysis.items.modeling.GenerateReportStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

Generate Report Step.

LABEL = 'generate_report'
run()

Run modeling step

make_params() dict

Make global params for modeling.

get_result()

See parent description.

classmethod estimate_resources(res: logml.analysis.common.JobResourcesReqs, cfg: GlobalConfig = None, df: Optional[pandas.core.frame.DataFrame] = None, strata_shapes: Optional[Dict[str, tuple]] = None, item_params: Any = None) None

See parent description.

class logml.analysis.items.modeling.PackageReportStep(*args, **kwargs)

Bases: logml.analysis.items.modeling.ModelingItemBase

Package report step

LABEL = 'package_report'
run()

Run modeling step

make_params() dict

Make global params for modeling.

get_result()

See parent description.

logml.analysis.items.modeling.stratum_high_resources_estimator(res: logml.analysis.common.JobResourcesReqs, strata_shapes: Optional[Dict[str, tuple]] = None, params: Optional[Any] = None, cpu_mul: float = 1.0, mem_mul: float = 2.0)

Default estimation of high resources requirements based on dataset rows and columns number.