evaluma.methods.aggregate#

Attributes#

Functions#

compute_aggregate(→ evaluma.results.AggregateResult)

Compute a point-estimate descriptive ranking from a normalized score matrix.

Module Contents#

evaluma.methods.aggregate._AGG_MODES#
evaluma.methods.aggregate.compute_aggregate(scores_matrix: pandas.DataFrame, agg='trimmed_mean') evaluma.results.AggregateResult#

Compute a point-estimate descriptive ranking from a normalized score matrix.

Note

This is a descriptive point estimate only (no CI). The trimmed-mean variant trims across datasets, not across seeds; with fewer than ~10 datasets the 25% trim is aggressive (e.g. 5 datasets → only 3 contribute). Treat results as exploratory. For a statistically grounded ranking with uncertainty, use evaluma.methods.iqm.compute_iqm() (requires multiple seeds).

Parameters:
  • scores_matrix – Normalized model × dataset score matrix.

  • agg – Aggregation mode — one of "trimmed_mean", "mean", "median".

Returns:

Result with .table sorted descending by score.

Return type:

AggregateResult

Raises:

ValueError – If agg is not one of the supported modes.