evaluma.methods.iqm

evaluma.methods.iqm#

Functions#

compute_iqm(raw_runs, norm_bounds[, n_bootstrap, ...])

Compute Agarwal IQM on the flat run×dataset array with stratified bootstrap CIs.

Module Contents#

evaluma.methods.iqm.compute_iqm(raw_runs, norm_bounds, n_bootstrap=1000, random_state=None)#

Compute Agarwal IQM on the flat run×dataset array with stratified bootstrap CIs.

Implements the IQM from Agarwal et al. 2021 (rliable): trim the outer 25% of the concatenated per-dataset, per-seed normalized scores and average the remainder. Bootstrap CIs are stratified — seeds are resampled independently within each dataset stratum.

Parameters:

raw_runs – Long-format DataFrame with columns ["model", "dataset", "seed", "score"].
norm_bounds – (low, high, metric_direction) where low and high are per-dataset pd.Series of normalization bounds and metric_direction is a {dataset: "min"|"max"} dict (or None).
n_bootstrap – Number of stratified bootstrap replicates for the 95% CI.
random_state – Seed for numpy.random.default_rng().

Returns:

Result with .table sorted descending by IQM.

Return type:

IQMResult