EDA Artifact Types Registry

EDA Artifacts

Provides registry functionality for EDA artifacts producers. For implementation details see EligibleEDAArtifactsProducers

correlation

Description

Produces: - pearson/spearman correlation for numerical columns - orders artifact by similarity using AgglomerativeClustering - list of correlation groups See logml.eda.artifacts.correlation.CorrelationSummary. Dependencies: - metadata artifact For implementation details see CorrelationSummaryProducer

Attributes
  • DEPENDENCIES: [‘metadata’]

dim_reduction

Description

Produces: - PCA output (+explained variance, feature weights) - TSNE output - LDA output - MCA output Dependencies: - metadata artifact For implementation details see DimensionalityReductionSummaryProducer

Attributes
  • DEPENDENCIES: [‘metadata’]

distributions

Description

Produces: - histograms for numerical columns - similarity ordering for calculated histograms Dependencies: - metadata artifact For implementation details see DistributionsSummaryProducer

Attributes
  • DEPENDENCIES: [‘metadata’]

metadata

Description

Produces: - list of numerical columns - list of categorical columns - list of all columns Dependencies - NO. For implementation details see MetadataProducer

Attributes

missingness

Description

Produces: - missing values per columns summaries (for num/cat/all columns) - missing values per row summaries (for num/cat/all columns) - complete dataset summaries (for num/cat/all columns) - similarity order by pairwise nan distances Dependencies: - metadata artifact For implementation details see MissingnessSummaryProducer

Attributes
  • DEPENDENCIES: [‘metadata’]

statistics

Description

Produces: - multiple statistics for numerical columns Dependencies: - metadata artifact For implementation details see StatisticsSummaryProducer

Attributes
  • DEPENDENCIES: [‘metadata’]