logml.eda.artifacts_producers.missingness
Functions
|
Returns a mapping between number of columns and the maximum amount of rows that have no NaNs in the columns subset of that size. |
|
Returns a summary for missing values per column. |
Classes
|
Produces: |
- class logml.eda.artifacts_producers.missingness.MissingnessSummaryProducer(metadata_cfg: logml.configuration.modeling.ModelingTaskSpec, global_params: dict, dataset_metadata: Optional[logml.data.metadata.DatasetMetadata] = None, logger=None, eda_params: Optional[logml.configuration.eda.EDAArtifactsGenerationParameters] = None)
Bases:
logml.eda.artifacts_producers.base.BaseEDAArtifactsProducer
Produces:
missing values per columns summaries (for num/cat/all columns)
missing values per row summaries (for num/cat/all columns)
complete dataset summaries (for num/cat/all columns)
similarity order by pairwise nan distances
Dependencies:
metadata artifact
- LABEL = 'missingness'
- DEPENDENCIES = ['metadata']
- ALIAS = 'Missingness summary producer'
- produce(dataframe: pandas.core.frame.DataFrame)
Generate missing data eda.
- logml.eda.artifacts_producers.missingness.get_missingness_summary_per_axes(dataframe: pandas.core.frame.DataFrame, target_columns: List[str]) Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame]
Returns a summary for missing values per column.
- logml.eda.artifacts_producers.missingness.get_complete_dateset_summary(dataframe: pandas.core.frame.DataFrame, target_columns: List[str]) pandas.core.frame.DataFrame
Returns a mapping between number of columns and the maximum amount of rows that have no NaNs in the columns subset of that size.