evaluma.cli
===========

.. py:module:: evaluma.cli


Functions
---------

.. autoapisummary::

   evaluma.cli._parse_metric_direction
   evaluma.cli._common_options
   evaluma.cli._load_bench
   evaluma.cli._save
   evaluma.cli.main
   evaluma.cli.report
   evaluma.cli.rank
   evaluma.cli.aggregate
   evaluma.cli.compare
   evaluma.cli.frequentist
   evaluma.cli.profiles


Module Contents
---------------

.. py:function:: _parse_metric_direction(ctx, param, value)

   Parse ``KEY:min`` / ``KEY:max`` tokens into a metric-direction dict.

   :param ctx: Click context (unused; required by the callback protocol).
   :param param: Click parameter (unused).
   :param value: Tuple of strings, each formatted as ``"KEY:min"`` or
                 ``"KEY:max"``.

   :returns: Mapping from dataset name to ``"min"`` or ``"max"``,
             or ``None`` when ``value`` is empty.
   :rtype: dict | None

   :raises click.BadParameter: If a token is malformed or the direction is not
       ``"min"`` or ``"max"``.


.. py:function:: _common_options(f)

   Attach shared CLI options to a Click command.


.. py:function:: _load_bench(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, seed=None)

   Load a CSV and return a normalized Benchmark, merging CLI args with config.

   :param csv_path: Path to the input CSV file.
   :param model: CLI value for the model column name.
   :param dataset: CLI value for the dataset column name.
   :param metric: CLI value for the metric column name.
   :param score: CLI value for the score column name.
   :param config_path: Optional path to a YAML config file.
   :param metric_direction: Parsed metric-direction dict (or ``None``).
   :param output_dir: Path to the output directory (created if absent).
   :param seed: Optional column name for the random seed.

   :returns: Loaded and normalized benchmark.
   :rtype: Benchmark


.. py:function:: _save(result, stem, output_dir)

   Serialize a result to CSV and PNG inside ``output_dir``.


.. py:function:: main()

   evaluma — ML benchmark evaluation tools.


.. py:function:: report(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)

   Run all three analyses and write results to ``--output``.


.. py:function:: rank(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, seed)

   Compute IQM rankings (requires seed column) and write iqm_ranking.{csv,png}.


.. py:function:: aggregate(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, agg)

   Compute point-estimate aggregate ranking and write aggregate_ranking.csv/png.


.. py:function:: compare(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)

   Compute Bayesian pairwise comparisons and write results.


.. py:function:: frequentist(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir, reference, alpha)

   Compute Friedman + Nemenyi (all-pairs) or Wilcoxon + Holm (reference) comparison.


.. py:function:: profiles(csv_path, model, dataset, metric, score, config_path, metric_direction, output_dir)

   Compute Dolan-Moré performance profiles and write results.