evaluma.normalize#

Functions#

normalize(matrix, *[, norm_ref_low, norm_ref_high, ...])

Apply per-dataset min-max normalization to a score matrix.

_resolve_bound(mat, bound, use_min)

Resolve a normalization bound specification to a per-column Series.

Module Contents#

evaluma.normalize.normalize(matrix, *, norm_ref_low=None, norm_ref_high=None, metric_direction=None)#

Apply per-dataset min-max normalization to a score matrix.

Parameters:
  • matrix – Model × dataset score matrix (models as rows, datasets as columns).

  • norm_ref_low – Lower bound for normalization — scalar, model name (row label), or {dataset: value} dict. None uses the per-dataset observed minimum and emits a UserWarning.

  • norm_ref_high – Upper bound for normalization, same format as norm_ref_low. None uses the per-dataset observed maximum.

  • metric_direction – Dict mapping dataset names to "min" or "max". Entries mapped to "min" cause the matrix to be negated before normalization.

Returns:

Normalized matrix with the same shape and index as matrix, values in [0, 1] within the reference bounds.

Return type:

pandas.DataFrame

Raises:

ValueError – If norm_ref_low or norm_ref_high is a string that does not name a row in matrix.

evaluma.normalize._resolve_bound(mat, bound, use_min)#

Resolve a normalization bound specification to a per-column Series.

Parameters:
  • mat – Score matrix (models × datasets).

  • boundNone (use data min/max), a scalar, a model-name string, or a {dataset: value} dict.

  • use_min – When bound is None, return the column-wise minimum if True, maximum if False.

Returns:

Per-column (dataset) bound values.

Return type:

pandas.Series

Raises:

ValueError – If bound is a string not present in mat.index.