evaluma.cli#
Functions#
|
Parse |
Attach shared CLI options to a Click command. |
|
|
Load a CSV and return a normalized Benchmark, merging CLI args with config. |
|
Serialize a result to CSV and PNG inside |
|
evaluma — ML benchmark evaluation tools. |
|
Run all three analyses and write results to |
|
Compute IQM rankings (requires seed column) and write iqm_ranking.{csv,png}. |
|
Compute point-estimate aggregate ranking and write aggregate_ranking.csv/png. |
|
Compute Bayesian pairwise comparisons and write results. |
|
Compute Friedman + Nemenyi (all-pairs) or Wilcoxon + Holm (reference) comparison. |
|
Compute Dolan-Moré performance profiles and write results. |
Module Contents#
- evaluma.cli._parse_metric_direction(ctx, param, value)#
Parse
KEY:min/KEY:maxtokens into a metric-direction dict.- Parameters:
ctx – Click context (unused; required by the callback protocol).
param – Click parameter (unused).
value – Tuple of strings, each formatted as
"KEY:min"or"KEY:max".
- Returns:
Mapping from dataset name to
"min"or"max", orNonewhenvalueis empty.- Return type:
dict | None
- Raises:
click.BadParameter – If a token is malformed or the direction is not
"min"or"max".
- evaluma.cli._common_options(f)#
Attach shared CLI options to a Click command.
- evaluma.cli._load_bench(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, seed=None)#
Load a CSV and return a normalized Benchmark, merging CLI args with config.
- Parameters:
csv_path – Path to the input CSV file.
model – CLI value for the model column name.
dataset – CLI value for the dataset column name.
metric – CLI value for the metric column name.
score – CLI value for the score column name.
config_path – Optional path to a YAML config file.
metric_direction – Parsed metric-direction dict (or
None).output_dir – Path to the output directory (created if absent).
seed – Optional column name for the random seed.
- Returns:
Loaded and normalized benchmark.
- Return type:
- evaluma.cli._save(result, stem, output_dir)#
Serialize a result to CSV and PNG inside
output_dir.
- evaluma.cli.main()#
evaluma — ML benchmark evaluation tools.
- evaluma.cli.report(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)#
Run all three analyses and write results to
--output.
- evaluma.cli.rank(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, seed)#
Compute IQM rankings (requires seed column) and write iqm_ranking.{csv,png}.
- evaluma.cli.aggregate(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, agg)#
Compute point-estimate aggregate ranking and write aggregate_ranking.csv/png.
- evaluma.cli.compare(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)#
Compute Bayesian pairwise comparisons and write results.
- evaluma.cli.frequentist(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, reference, alpha)#
Compute Friedman + Nemenyi (all-pairs) or Wilcoxon + Holm (reference) comparison.
- evaluma.cli.profiles(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)#
Compute Dolan-Moré performance profiles and write results.