logml.report.plotters.feature_importance

Functions

`build_bootstrapping_result_plots`(result)	Produces custom plots for a given model assuming we use its bootstrapped coefficients as feature importance.
`build_fi_rankings_heatmap`(results, title, ...)	For a set of FI rankings (per method) shows a heatmap.
`build_fi_rankings_heatmap_h`(results, title)	For a set of FI rankings (per method) shows a heatmap.
`filter_ranks_df`(df[, max_rank])	Filters ranks dataset by rank.
`generate_buttons`(dataframe, target, features)	Returns a list with simple plotly-compatible buttons configurations.
`plot_stata_similarity_heatmaps`(col_width, ...)	Plot strata similarity heatmaps
`plot_strata_ranks_clustermap`(df[, ...])	Display clustermap of strata features ranking.
`show_categorical_target_vs_numerical_features_associations`(...)	Creates an interactive view for examining target vs features relationships.
`show_cross_strata_fi_rankings_scatter`(data, ...)	Simple scatter for displaying averaging rankings for 2 given stratas.
`show_numerical_target_vs_categorical_features_associations`(...)	Creates an interactive view for examining target vs features relationships.
`show_numerical_target_vs_numerical_features_associations`(...)	Creates an interactive view for examining target vs features relationships.
`show_ranking_scatter_plot`(rank_data)	Shows ranks for all pairs of strata.

logml.report.plotters.feature_importance.build_fi_rankings_heatmap(results: pandas.core.frame.DataFrame, title: str, **kwargs): For a set of FI rankings (per method) shows a heatmap. The brighter color the better.

logml.report.plotters.feature_importance.build_fi_rankings_heatmap_h(results: pandas.core.frame.DataFrame, title: str, max_name_len=22, **kwargs)

For a set of FI rankings (per method) shows a heatmap. The brighter color the better.

Differs from build_fi_rankings_heatmap_h by horizontal direction of chart due to the strange fact that plotly provider sliders only for x-axis.

logml.report.plotters.feature_importance.build_bootstrapping_result_plots(result: pandas.core.frame.DataFrame)

Produces custom plots for a given model assuming we use its bootstrapped coefficients as feature importance.

Before plotting the data we do filter the (frequency, coefficient) pairs using the following approach:

coefficients

a) In case any RANDOM features were used - we use the maximal absolute value of such coefficients as the threshold. b) Otherwise we use 1e-2 as the threshold for absolute values filtering.

frequency

a) In case there are enough (> 20) features with frequency > 0.7, we plot only those. b) Otherwise we plot all features.

Plots produced:

Summary plot - just enumerate all results features as a table.

Barchart plot - to visually compare features coefficient magnitude.

Scatter plot - to visually assess coefficients and frequencies.

logml.report.plotters.feature_importance.show_cross_strata_fi_rankings_scatter(data: pandas.core.frame.DataFrame, x_column: str, y_column: str): Simple scatter for displaying averaging rankings for 2 given stratas.

logml.report.plotters.feature_importance.show_ranking_scatter_plot(rank_data: pandas.core.frame.DataFrame): Shows ranks for all pairs of strata.

logml.report.plotters.feature_importance.filter_ranks_df(df, max_rank=30): Filters ranks dataset by rank. Expected df in format: - index = feature names, columns = strata names, values = rank ranks are 1-based.

logml.report.plotters.feature_importance.plot_strata_ranks_clustermap(df, cols_filter=None, replace_empty=None, max_rank=None, cmap='Greens_r', figsize=(12, 22), title=None): Display clustermap of strata features ranking.

logml.report.plotters.feature_importance.plot_stata_similarity_heatmaps(col_width, labels, plots_per_row, row_height, stats_data): Plot strata similarity heatmaps

logml.report.plotters.feature_importance.show_numerical_target_vs_numerical_features_associations(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str])

Creates an interactive view for examining target vs features relationships.

Target column is assumed to be numerical. Features are assumed to be numerical.

logml.report.plotters.feature_importance.show_numerical_target_vs_categorical_features_associations(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str])

Creates an interactive view for examining target vs features relationships.

Target column is assumed to be numerical. Features are assumed to be categorical.

logml.report.plotters.feature_importance.show_categorical_target_vs_numerical_features_associations(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str])

Creates an interactive view for examining target vs features relationships.

Target column is assumed to be categorical. Features are assumed to be numerical.

logml.report.plotters.feature_importance.generate_buttons(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str], swap_axes: bool = False, **kwargs) → List[Dict]: Returns a list with simple plotly-compatible buttons configurations.