evaluma.plot#
Functions#
|
Render aggregate scores as a horizontal bar chart (no CI whiskers). |
|
Render IQM scores as a horizontal bar chart with CI error bars. |
|
Render Bayesian pairwise probabilities as a matplotlib heatmap. |
|
Render Bayesian comparison against a reference as stacked horizontal bars. |
|
Render a Critical Difference diagram (Demšar 2006). |
|
Render frequentist reference-mode results as horizontal bars. |
|
Render Dolan-Moré performance profile curves. |
Module Contents#
- evaluma.plot.plot_aggregate_ranking(table: pandas.DataFrame, *, figsize=None, model_colors=None, title=None, ax=None)#
Render aggregate scores as a horizontal bar chart (no CI whiskers).
- Parameters:
table – DataFrame with columns
modelandscore.figsize – Figure size
(width, height)in inches.model_colors – List of colors, one per model in row order.
title – Optional axes title.
ax – Existing axes to draw into; a new figure is created if
None.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure
- evaluma.plot.plot_iqm_ranking(table: pandas.DataFrame, *, figsize=None, model_colors=None, title=None, ax=None)#
Render IQM scores as a horizontal bar chart with CI error bars.
- Parameters:
table – DataFrame with columns
model,IQM,CI_low,CI_highas produced bycompute_iqm().figsize – Figure size
(width, height)in inches.model_colors – List of colors, one per model in row order.
title – Optional axes title.
ax – Existing axes to draw into; a new figure is created if
None.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure
- evaluma.plot.plot_bayesian_heatmap(table: pandas.DataFrame, *, title=None, figsize=None, **_kwargs)#
Render Bayesian pairwise probabilities as a matplotlib heatmap.
Each cell
(i, j)showsP(model_i > model_j).- Parameters:
table – DataFrame with columns
model_a,model_b,p_a_better,p_equiv,p_b_better.title – Optional figure title.
figsize – Figure size
(width, height)in inches.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure
- evaluma.plot.plot_bayesian_reference_bars(table: pandas.DataFrame, reference: str, *, title=None, figsize=None)#
Render Bayesian comparison against a reference as stacked horizontal bars.
Each bar represents one model compared to the reference. Blue = P(model > reference), grey = P(equivalent), red = P(reference > model). Bars are sorted by P(model > reference) descending.
- Parameters:
table – DataFrame with columns
model_a,model_b,p_a_better,p_equiv,p_b_better. Expectsmodel_a == referencefor all rows (as produced bycompute_bayesian()in reference mode).reference – Name of the reference model.
title – Optional figure title.
figsize – Figure size
(width, height)in inches.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure
- evaluma.plot.plot_cd_diagram(avg_ranks: pandas.Series, cd: float, *, title=None, figsize=None)#
Render a Critical Difference diagram (Demšar 2006).
Models are placed on a horizontal axis by average rank (rank 1 = best on the left). Thick horizontal bars connect cliques of models whose rank gap does not exceed the Nemenyi CD scalar. A CD bracket in the top-right corner shows the critical difference visually.
- Parameters:
avg_ranks – Series mapping model names to average rank (lower = better), as produced by
compute_frequentist().cd – Nemenyi critical difference scalar.
title – Optional axes title.
figsize – Figure size
(width, height)in inches.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure
- evaluma.plot.plot_frequentist_reference_bars(table: pandas.DataFrame, reference: str, alpha: float, *, title=None, figsize=None)#
Render frequentist reference-mode results as horizontal bars.
Each bar shows the Holm-corrected p-value for a model vs the reference. A vertical dashed line marks the significance threshold.
- Parameters:
table – DataFrame with columns
model_a,model_b,p_value_corrected,significant, as produced bycompute_frequentist()in reference mode.reference – Name of the reference model.
alpha – Significance threshold; used to position the dashed line.
title – Optional figure title.
figsize – Figure size
(width, height)in inches.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure
- evaluma.plot.plot_performance_profiles(table: pandas.DataFrame, *, figsize=None, model_colors=None, title=None, ax=None)#
Render Dolan-Moré performance profile curves.
The x-axis uses a native log₁₀ scale with raw τ ratio values (1, 2, 5, 10…), following ML-GYM (Batra et al., 2025) and the AutoML Decathlon (Roberts et al., 2022). τ = 1 means tied for best; τ = 10 means 10× worse than the best model.
- Parameters:
table – Long-format DataFrame with columns
tau,model,fraction_within_tau.figsize – Figure size in inches.
model_colors – Dict mapping model names to colors, or a list in model order.
title – Optional axes title.
ax – Existing axes to draw into; a new figure is created if
None.
- Returns:
The rendered figure.
- Return type:
matplotlib.figure.Figure