logml.eda.artifacts.correlation
Classes
|
Data structure that simulates Correlation graph. |
|
Wrapper for correlation group information that provides additional utilities. |
Wraps correlation artifacts. |
- class logml.eda.artifacts.correlation.CorrelationGraph(node_labels: List[str])
Bases:
object
Data structure that simulates Correlation graph.
Operations supported:
initialization based on a list of nodes
an edge can be added between two nodes
graph’s nodes can be ranked by degree
- list_adjacent_nodes(node_id: str) List[str]
Returns a list with adjacent nodes for a given node.
- add_edge(u_node: str, v_node: str)
Adds a given edge to the graph.
- rank_nodes_by_degree() List[str]
Returns a list of ordered node labels by degree (desc) and label (asc).
- class logml.eda.artifacts.correlation.CorrelationGroup(features: List[str], group_id: int = 0, key_names: Optional[List[str]] = None, weigths: Optional[List[float]] = None)
Bases:
object
Wrapper for correlation group information that provides additional utilities.
- GROUP_PREFIX = 'CG'
- property group_prefix
Returns code name for the correlation group.
- get_group_name(max_key_names: int = 5) str
Returns an alias for correlation group.
In case any of key names were found - we use only active ones (presented), otherwise we use the first feature’s name.
- get_features_intersection(target_features: List[str]) List[str]
Returns an intersection between a given list of features and correlation group’s features.
- get_main_feature(target_features: List[str]) str
Returns one feature from a given list that will represent the whole correlation group.
- class logml.eda.artifacts.correlation.CorrelationSummary
Bases:
object
Wraps correlation artifacts.
The following artifacts are available:
Pearson/Spearman correlation matrix (on top of numerical columns)
Linkage matrix - complete linkage based on correlation matrix
- Similarity order - based on linkage matrix original columns are ordered by similarity of correlation
matrix columns
Correlation graph
Correlation groups
- LABEL = 'correlation'
- corr_groups_to_df() Optional[pandas.core.frame.DataFrame]
Returns correlation groups as dataframe.