logml.eda.artifacts_producers.distributions

Functions

create_hist_features(histograms)

Retrieves histograms values and concatenates.

get_histograms(dataframe, numeric_columns[, ...])

Returns histograms for a given list of numeric columns.

Classes

DistributionsSummaryProducer(metadata_cfg, ...)

Produces:

class logml.eda.artifacts_producers.distributions.DistributionsSummaryProducer(metadata_cfg: logml.configuration.modeling.ModelingTaskSpec, global_params: dict, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, logger=None, eda_params: Optional[logml.configuration.eda.EDAArtifactsGenerationParameters] = None)

Bases: logml.eda.artifacts_producers.base.BaseEDAArtifactsProducer

Produces:

  • histograms for numerical columns

  • similarity ordering for calculated histograms

Dependencies:

  • metadata artifact

LABEL = 'distributions'
DEPENDENCIES = ['metadata']
ALIAS = 'Distributions summary producer'
produce(dataframe: pandas.core.frame.DataFrame)

Creates and dumps EDA artifact for a given dataframe.

logml.eda.artifacts_producers.distributions.get_histograms(dataframe: pandas.core.frame.DataFrame, numeric_columns: List[str], bins: int = 30) List[Tuple[str, Dict]]

Returns histograms for a given list of numeric columns.

logml.eda.artifacts_producers.distributions.create_hist_features(histograms: Dict[str, Dict[str, numpy.array]]) numpy.array

Retrieves histograms values and concatenates.