evaluma.methods.rank_sensitivity#

Functions#

_validate_aligned_axes(scores_a, scores_b)

Validate that model and dataset label sets match across conditions.

_ranks_from_scores(→ pandas.Series)

Aggregate per-model scores and convert them to ranks.

compute_rank_sensitivity(...)

Compute ranking sensitivity between two model×dataset score matrices.

Module Contents#

evaluma.methods.rank_sensitivity._validate_aligned_axes(scores_a: pandas.DataFrame, scores_b: pandas.DataFrame)#

Validate that model and dataset label sets match across conditions.

Parameters:
  • scores_a – Condition A normalized scores (model × dataset).

  • scores_b – Condition B normalized scores (model × dataset).

Raises:
  • ValueError – If model label sets differ.

  • ValueError – If dataset label sets differ.

evaluma.methods.rank_sensitivity._ranks_from_scores(scores: pandas.DataFrame, agg='trimmed_mean') pandas.Series#

Aggregate per-model scores and convert them to ranks.

Parameters:
  • scores – Normalized score matrix (model × dataset).

  • agg – Aggregation mode passed to _aggregate_scores(); defaults to "trimmed_mean" to match aggregate_ranking.

Returns:

Average ranks with rank 1 as best (higher score is better).

Return type:

pd.Series

evaluma.methods.rank_sensitivity.compute_rank_sensitivity(scores_a: pandas.DataFrame, scores_b: pandas.DataFrame, cond_a, cond_b, n_bootstrap=1000, random_state=None, agg='trimmed_mean') evaluma.results.RankSensitivityResult#

Compute ranking sensitivity between two model×dataset score matrices.

Parameters:
  • scores_a – Condition A normalized scores (model × dataset).

  • scores_b – Condition B normalized scores (model × dataset).

  • cond_a – Label for condition A (used in output table/plot labels).

  • cond_b – Label for condition B (used in output table/plot labels).

  • n_bootstrap – Number of dataset-bootstrap samples for the 95% CI.

  • random_state – Seed for numpy.random.default_rng.

  • agg – Per-model aggregation defining the ranking — "trimmed_mean" (default, matching aggregate_ranking), "mean", or "median".

Returns:

Rank sensitivity point estimates, CI, and table.

Return type:

RankSensitivityResult

Raises:
  • ValueError – If n_bootstrap < 0.

  • ValueError – If agg is not a supported mode.

  • ValueError – If model or dataset labels are misaligned.