logml.configuration.cross_validation

Functions

make_cv_params([stratify, cv_type, n_folds, ...])

Prepare CV class type at its params.

Classes

CVSplitType(value)

Type of CV splits: k-fold or shuffle

class logml.configuration.cross_validation.CVSplitType(value)

Bases: logml.common.StrEnum

Type of CV splits: k-fold or shuffle

KFOLD = 'kfold'
SHUFFLE = 'shuffle'
class logml.configuration.cross_validation.CrossValidationSection

Bases: pydantic.main.BaseModel

Configure CV application for the dataset.

Show JSON schema
{
   "title": "CrossValidationSection",
   "description": "Configure CV application for the dataset.",
   "type": "object",
   "properties": {
      "random_state": {
         "title": "Random State",
         "description": "State to initialize random numbers generation.",
         "type": "integer"
      },
      "split_type": {
         "description": "Configures coverage of splits. 'kfold' covers dataset completely, 'shuffle' - does not guarantee it due to sampling.",
         "default": "kfold",
         "allOf": [
            {
               "$ref": "#/definitions/CVSplitType"
            }
         ]
      },
      "n_folds": {
         "title": "N Folds",
         "description": "How many CV folds should be produced.",
         "default": 20,
         "type": "integer"
      },
      "test_size": {
         "title": "Test Size",
         "description": "Which portion of the dataset to leave for evaluation of the fold.",
         "default": 0.2,
         "type": "number"
      },
      "type": {
         "title": "Type",
         "description": "To be set automatically. Cross Validation strategy alias to use (\"kfold\", \"stratifiedkfold\", etc.). Reference: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection",
         "default": "",
         "type": "string"
      },
      "params": {
         "title": "Params",
         "description": "To be set automatically.Parameters that will be passed to corresponding Scikit-learn classes. Please refer to the official Scikit-learn documentation for details.",
         "default": {},
         "type": "object"
      }
   },
   "definitions": {
      "CVSplitType": {
         "title": "CVSplitType",
         "description": "Type of CV splits: k-fold or shuffle",
         "enum": [
            "kfold",
            "shuffle"
         ],
         "type": "string"
      }
   }
}

Fields
field random_state: Optional[int] = None

State to initialize random numbers generation.

field split_type: logml.configuration.cross_validation.CVSplitType = CVSplitType.KFOLD

Configures coverage of splits. ‘kfold’ covers dataset completely, ‘shuffle’ - does not guarantee it due to sampling.

field n_folds: int = 20

How many CV folds should be produced.

field test_size: float = 0.2

Which portion of the dataset to leave for evaluation of the fold.

field type: str = ''

To be set automatically. Cross Validation strategy alias to use (“kfold”, “stratifiedkfold”, etc.). Reference: https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection

field params: dict = {}

To be set automatically.Parameters that will be passed to corresponding Scikit-learn classes. Please refer to the official Scikit-learn documentation for details.

get_cv_params(objective: Optional[logml.common.ModelingTask] = None, generator_type: Optional[str] = None) Tuple[str, dict]

Returns CV class type and parameters.

logml.configuration.cross_validation.make_cv_params(stratify: bool = False, cv_type: logml.configuration.cross_validation.CVSplitType = CVSplitType.KFOLD, n_folds=100, test_size=0.25, random_state=None, generator_type: Optional[str] = None) Tuple[str, Dict]

Prepare CV class type at its params.