logml.eda.artifacts.correlation

Classes

CorrelationGraph(node_labels)

Data structure that simulates Correlation graph.

CorrelationGroup(features[, group_id, ...])

Wrapper for correlation group information that provides additional utilities.

CorrelationSummary()

Wraps correlation artifacts.

class logml.eda.artifacts.correlation.CorrelationGraph(node_labels: List[str])

Bases: object

Data structure that simulates Correlation graph.

Operations supported:

  1. initialization based on a list of nodes

  2. an edge can be added between two nodes

  3. graph’s nodes can be ranked by degree

list_adjacent_nodes(node_id: str) List[str]

Returns a list with adjacent nodes for a given node.

add_edge(u_node: str, v_node: str)

Adds a given edge to the graph.

rank_nodes_by_degree() List[str]

Returns a list of ordered node labels by degree (desc) and label (asc).

class logml.eda.artifacts.correlation.CorrelationGroup(features: List[str], group_id: int = 0, key_names: Optional[List[str]] = None, weigths: Optional[List[float]] = None)

Bases: object

Wrapper for correlation group information that provides additional utilities.

GROUP_PREFIX = 'CG'
property group_prefix

Returns code name for the correlation group.

get_group_name(max_key_names: int = 5) str

Returns an alias for correlation group.

In case any of key names were found - we use only active ones (presented), otherwise we use the first feature’s name.

get_features_intersection(target_features: List[str]) List[str]

Returns an intersection between a given list of features and correlation group’s features.

get_main_feature(target_features: List[str]) str

Returns one feature from a given list that will represent the whole correlation group.

class logml.eda.artifacts.correlation.CorrelationSummary

Bases: object

Wraps correlation artifacts.

The following artifacts are available:

  • Pearson/Spearman correlation matrix (on top of numerical columns)

  • Linkage matrix - complete linkage based on correlation matrix

  • Similarity order - based on linkage matrix original columns are ordered by similarity of correlation

    matrix columns

  • Correlation graph

  • Correlation groups

LABEL = 'correlation'
corr_groups_to_df() Optional[pandas.core.frame.DataFrame]

Returns correlation groups as dataframe.