logml.eda.artifacts_producers.dimensionality_reduction

Functions

get_n_components(pca)

Returns the number of PCs that cover 95% of variance.

Classes

DimensionalityReductionSummaryProducer(...)

Produces:

class logml.eda.artifacts_producers.dimensionality_reduction.DimensionalityReductionSummaryProducer(metadata_cfg: logml.configuration.modeling.ModelingTaskSpec, global_params: dict, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, logger=None, eda_params: Optional[logml.configuration.eda.EDAArtifactsGenerationParameters] = None)

Bases: logml.eda.artifacts_producers.base.BaseEDAArtifactsProducer

Produces:

  • PCA output (+explained variance, feature weights)

  • TSNE output

  • LDA output

  • MCA output

Dependencies:

  • metadata artifact

LABEL = 'dim_reduction'
DEPENDENCIES = ['metadata']
ALIAS = 'Dimensionality reduction summary producer'
get_numeric_columns(dataframe: pandas.core.frame.DataFrame, target_column: Optional[str] = None) List[str]

Applies basic filtering to the list of numerical columns.

produce(dataframe: pandas.core.frame.DataFrame)

Creates and dumps EDA artifact for a given dataframe.

logml.eda.artifacts_producers.dimensionality_reduction.get_n_components(pca: sklearn.decomposition._pca.PCA) Tuple[int, numpy.array]

Returns the number of PCs that cover 95% of variance.