logml.data.datasets.cv_dataset

Classes

ModelingDataset(*dont_use_positional_args[, ...])

Modeling dataset.

class logml.data.datasets.cv_dataset.ModelingDataset(*dont_use_positional_args, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, dataframe: Optional[pandas.core.frame.DataFrame] = None, objective_cfg: Optional[logml.configuration.modeling.ModelingTaskSpec] = None, cross_validator: Optional[Union[sklearn.model_selection._split.BaseCrossValidator, Iterable]] = None, features: Optional[List[str]] = None, logger=None, **kwargs)

Bases: logml.data.datasets.base.BaseDataset, logml.data.datasets.base.CrossValidationMixin

Modeling dataset. Combines data, metadata, modeling objective and CV.

LABEL = 'cv_dataset'
update_target_values()

Update list of unique target values. Applies for classification problems only. Invoke immediately after target columns manipulation, if there are any.

property task

Makes _task property public.

property target_column

Makes _target property public.

property target_metric

Makes _target_metric property public.

property target_labels

Makes _target_labels property public. NOTE: Applicable only for classification problems.

property features

Returns list of feaures (aka modeling covariates or input variables).

get_features_list() List[str]

Return a list of ‘feature’ columns. DEPRECATED, use features property.

set_features_list(features: List[str]) None

Set list of ‘feature’ columns.

get_feature_values(feature_name: str) numpy.array

Returns values of a given feature.

get_features_matrix() numpy.ndarray

Returns X array, aka covariates matrix.

get_target_values() numpy.ndarray

Returns modeling target (y column) for the current dataframe.

property cv_dataframe: pandas.core.frame.DataFrame

Returns CV dataframe.

property cv_features: numpy.array

Returns features from CV dataframe.

property cv_targets: numpy.array

Returns targets from CV dataframe.

get_target_columns() List

Returns list of target columns.

drop(features: List[str], copy_object=False) logml.data.datasets.cv_dataset.ModelingDataset

Drop features

select(features: List[str], copy_object=False) logml.data.datasets.cv_dataset.ModelingDataset

Limit feature set to the given one.

clone(deep=False)

Clones current dataset.

Parameters

deep – When True, new object is created using ‘copy.deepcopy’. Otherwise, a new dataset object is created using the same dataframe as current one.

Returns: