logml.report.plotters.feature_importance

Functions

build_bootstrapping_result_plots(result)

Produces custom plots for a given model assuming we use its bootstrapped coefficients as feature importance.

build_fi_rankings_heatmap(results, title, ...)

For a set of FI rankings (per method) shows a heatmap.

build_fi_rankings_heatmap_h(results, title)

For a set of FI rankings (per method) shows a heatmap.

filter_ranks_df(df[, max_rank])

Filters ranks dataset by rank.

generate_buttons(dataframe, target, features)

Returns a list with simple plotly-compatible buttons configurations.

plot_stata_similarity_heatmaps(col_width, ...)

Plot strata similarity heatmaps

plot_strata_ranks_clustermap(df[, ...])

Display clustermap of strata features ranking.

show_categorical_target_vs_numerical_features_associations(...)

Creates an interactive view for examining target vs features relationships.

show_cross_strata_fi_rankings_scatter(data, ...)

Simple scatter for displaying averaging rankings for 2 given stratas.

show_numerical_target_vs_categorical_features_associations(...)

Creates an interactive view for examining target vs features relationships.

show_numerical_target_vs_numerical_features_associations(...)

Creates an interactive view for examining target vs features relationships.

show_ranking_scatter_plot(rank_data)

Shows ranks for all pairs of strata.

logml.report.plotters.feature_importance.build_fi_rankings_heatmap(results: pandas.core.frame.DataFrame, title: str, **kwargs)

For a set of FI rankings (per method) shows a heatmap. The brighter color the better.

logml.report.plotters.feature_importance.build_fi_rankings_heatmap_h(results: pandas.core.frame.DataFrame, title: str, max_name_len=22, **kwargs)

For a set of FI rankings (per method) shows a heatmap. The brighter color the better.

Differs from build_fi_rankings_heatmap_h by horizontal direction of chart due to the strange fact that plotly provider sliders only for x-axis.

logml.report.plotters.feature_importance.build_bootstrapping_result_plots(result: pandas.core.frame.DataFrame)

Produces custom plots for a given model assuming we use its bootstrapped coefficients as feature importance.

Before plotting the data we do filter the (frequency, coefficient) pairs using the following approach:

  1. coefficients

a) In case any RANDOM features were used - we use the maximal absolute value of such coefficients as the threshold. b) Otherwise we use 1e-2 as the threshold for absolute values filtering.

  1. frequency

a) In case there are enough (> 20) features with frequency > 0.7, we plot only those. b) Otherwise we plot all features.

Plots produced:

  1. Summary plot - just enumerate all results features as a table.

  2. Barchart plot - to visually compare features coefficient magnitude.

  3. Scatter plot - to visually assess coefficients and frequencies.

logml.report.plotters.feature_importance.show_cross_strata_fi_rankings_scatter(data: pandas.core.frame.DataFrame, x_column: str, y_column: str)

Simple scatter for displaying averaging rankings for 2 given stratas.

logml.report.plotters.feature_importance.show_ranking_scatter_plot(rank_data: pandas.core.frame.DataFrame)

Shows ranks for all pairs of strata.

logml.report.plotters.feature_importance.filter_ranks_df(df, max_rank=30)

Filters ranks dataset by rank. Expected df in format: - index = feature names, columns = strata names, values = rank ranks are 1-based.

logml.report.plotters.feature_importance.plot_strata_ranks_clustermap(df, cols_filter=None, replace_empty=None, max_rank=None, cmap='Greens_r', figsize=(12, 22), title=None)

Display clustermap of strata features ranking.

logml.report.plotters.feature_importance.plot_stata_similarity_heatmaps(col_width, labels, plots_per_row, row_height, stats_data)

Plot strata similarity heatmaps

logml.report.plotters.feature_importance.show_numerical_target_vs_numerical_features_associations(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str])

Creates an interactive view for examining target vs features relationships.

Target column is assumed to be numerical. Features are assumed to be numerical.

logml.report.plotters.feature_importance.show_numerical_target_vs_categorical_features_associations(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str])

Creates an interactive view for examining target vs features relationships.

Target column is assumed to be numerical. Features are assumed to be categorical.

logml.report.plotters.feature_importance.show_categorical_target_vs_numerical_features_associations(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str])

Creates an interactive view for examining target vs features relationships.

Target column is assumed to be categorical. Features are assumed to be numerical.

logml.report.plotters.feature_importance.generate_buttons(dataframe: pandas.core.frame.DataFrame, target: str, features: List[str], swap_axes: bool = False, **kwargs) List[Dict]

Returns a list with simple plotly-compatible buttons configurations.