evaluma.cli

evaluma.cli#

`_parse_metric_direction`(ctx, param, value)	Parse `KEY:min` / `KEY:max` tokens into a metric-direction dict.
`_common_options`(f)	Attach shared CLI options to a Click command.
`_load_bench`(csv_path, model, dataset, metric, score, ...)	Load a CSV and return a normalized Benchmark, merging CLI args with config.
`_save`(result, stem, output_dir)	Serialize a result to CSV and PNG inside `output_dir`.
`main`()	evaluma — ML benchmark evaluation tools.
`report`(csv_path, model, dataset, metric, score, ...)	Run all three analyses and write results to `--output`.
`rank`(csv_path, model, dataset, metric, score, ...)	Compute IQM rankings (requires seed column) and write iqm_ranking.{csv,png}.
`aggregate`(csv_path, model, dataset, metric, score, ...)	Compute point-estimate aggregate ranking and write aggregate_ranking.csv/png.
`compare`(csv_path, model, dataset, metric, score, ...)	Compute Bayesian pairwise comparisons and write results.
`frequentist`(csv_path, model, dataset, metric, score, ...)	Compute Friedman + Nemenyi (all-pairs) or Wilcoxon + Holm (reference) comparison.
`profiles`(csv_path, model, dataset, metric, score, ...)	Compute Dolan-Moré performance profiles and write results.

evaluma.cli._parse_metric_direction(ctx, param, value)#

Parse KEY:min / KEY:max tokens into a metric-direction dict.

Parameters:

Returns:

Mapping from dataset name to "min" or "max", or None when value is empty.

Return type:

dict | None

Raises:

click.BadParameter – If a token is malformed or the direction is not "min" or "max".

evaluma.cli._common_options(f)#: Attach shared CLI options to a Click command.

evaluma.cli._load_bench(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, seed=None)#

Load a CSV and return a normalized Benchmark, merging CLI args with config.

Parameters:

Returns:

Loaded and normalized benchmark.

Return type:

Benchmark

evaluma.cli._save(result, stem, output_dir)#: Serialize a result to CSV and PNG inside output_dir.

evaluma.cli.report(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)#: Run all three analyses and write results to --output.

evaluma.cli.rank(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, seed)#: Compute IQM rankings (requires seed column) and write iqm_ranking.{csv,png}.

evaluma.cli.aggregate(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, agg)#: Compute point-estimate aggregate ranking and write aggregate_ranking.csv/png.

evaluma.cli.compare(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)#: Compute Bayesian pairwise comparisons and write results.

evaluma.cli.frequentist(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, reference, alpha)#: Compute Friedman + Nemenyi (all-pairs) or Wilcoxon + Holm (reference) comparison.

evaluma.cli.profiles(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)#: Compute Dolan-Moré performance profiles and write results.