# References

## Methods

**Interquartile Mean (IQM)**

Agarwal, R., Schwarzer, M., Castro, P. S., Courville, A. C., & Bellemare, M. G. (2021).
Deep Reinforcement Learning at the Edge of the Statistical Precipice.
*Advances in Neural Information Processing Systems*, 34.
<https://arxiv.org/abs/2108.13264>

---

**Bayesian Pairwise Comparison**

Benavoli, A., Corani, G., Demšar, J., & Zaffalon, M. (2017).
Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis.
*Journal of Machine Learning Research*, 18(77), 1–36.
<https://jmlr.org/papers/v18/16-305.html>

---

**Dolan-Moré Performance Profiles**

Dolan, E. D., & Moré, J. J. (2002).
Benchmarking Optimization Software with Performance Profiles.
*Mathematical Programming*, 91(2), 201–213.
<https://doi.org/10.1007/s101070100263>

---

## Libraries

**baycomp**

Janez Demšar. *baycomp: Bayesian comparison of classifiers*.
<https://github.com/janezd/baycomp>