evaluma.methods.iqm#
Functions#
|
Compute Agarwal IQM on the flat run×dataset array with stratified bootstrap CIs. |
Module Contents#
- evaluma.methods.iqm.compute_iqm(raw_runs, norm_bounds, n_bootstrap=1000, random_state=None)#
Compute Agarwal IQM on the flat run×dataset array with stratified bootstrap CIs.
Implements the IQM from Agarwal et al. 2021 (rliable): trim the outer 25% of the concatenated per-dataset, per-seed normalized scores and average the remainder. Bootstrap CIs are stratified — seeds are resampled independently within each dataset stratum.
- Parameters:
raw_runs – Long-format DataFrame with columns
["model", "dataset", "seed", "score"].norm_bounds –
(low, high, metric_direction)wherelowandhighare per-datasetpd.Seriesof normalization bounds andmetric_directionis a{dataset: "min"|"max"}dict (orNone).n_bootstrap – Number of stratified bootstrap replicates for the 95% CI.
random_state – Seed for
numpy.random.default_rng().
- Returns:
Result with
.tablesorted descending by IQM.- Return type: