logml.model_search.selection

Models training and selection

Functions

compare_model_with_baseline(model, baseline)

Compare loss of Model and Baseline Model.

loss_ks_test(sample, baseline)

Kolmogorov-Smirnov Test if Sample dist is less than Baseline.

loss_u_test(sample, baseline)

Mann-Whitney/U-Test if Sample dist is less than Baseline.

Classes

ModelSelection([config, objective_config, ...])

Trains all models and perform selection

logml.model_search.selection.loss_u_test(sample: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], numpy.typing._array_like._SupportsArray[numpy.dtype], Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]], Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]], Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]], Sequence[Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]], baseline: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], numpy.typing._array_like._SupportsArray[numpy.dtype], Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]], Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]], Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]], Sequence[Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]) float

Mann-Whitney/U-Test if Sample dist is less than Baseline.

logml.model_search.selection.loss_ks_test(sample: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], numpy.typing._array_like._SupportsArray[numpy.dtype], Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]], Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]], Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]], Sequence[Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]], baseline: Union[Sequence[Sequence[Sequence[Sequence[Sequence[Any]]]]], numpy.typing._array_like._SupportsArray[numpy.dtype], Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]], Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]], Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]], Sequence[Sequence[Sequence[Sequence[numpy.typing._array_like._SupportsArray[numpy.dtype]]]]], bool, int, float, complex, str, bytes, Sequence[Union[bool, int, float, complex, str, bytes]], Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]], Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]], Sequence[Sequence[Sequence[Sequence[Union[bool, int, float, complex, str, bytes]]]]]]) float

Kolmogorov-Smirnov Test if Sample dist is less than Baseline.

logml.model_search.selection.compare_model_with_baseline(model: logml.model_search.common.ModelEvaluationData, baseline: logml.model_search.common.ModelEvaluationData, min_test_size_limit=7, pvalue_threshold=0.01) dict

Compare loss of Model and Baseline Model.

If length of raw losses is less than min_test_size_limit, compare median values, else - perform one-sided U-test to check that model loss distribution is statistically less than the baselines’ one.

Returns

test name, select: bool, pvalue.

Return type

dict with the fields

class logml.model_search.selection.ModelSelection(config: Optional[logml.configuration.modeling.ModelSearchSection] = None, objective_config: Optional[logml.configuration.modeling.ModelingTaskSpec] = None, hpo_config: Optional[logml.configuration.modeling.HPOSection] = None, model_provider: Optional[logml.model_search.provider.ModelProvider] = None, logger=None, dump_hpo_data=True, show_progressbar=True, min_test_size_limit=7)

Bases: object

Trains all models and perform selection

run(dataset: logml.data.datasets.cv_dataset.ModelingDataset)

Run model selection.

get_model_config(model_name: str) Optional[logml.configuration.modeling.ModelSelectionConfig]

Returns ModelSelectionConfig for the model.

train_and_evaluate(model_config: logml.configuration.modeling.ModelSelectionConfig, dataset: logml.data.datasets.cv_dataset.ModelingDataset, ds_name=None, dump_result=True) logml.model_search.common.ModelEvaluationData

HPO, train and evaluate a model.