A Practical Taxonomy of Reproducibility for Machine Learning Research

Discussions of reproducibility in science are often framed from the perspective of scientists and researchers who want to validate published claims. A complementary perspective is that of the practitioner who sets out to apply a new computational method within their own domain, the first step of which is often to reproduce the published results as a check for correctness of code. In this paper we discuss a taxonomy of reproducibility from this perspective of a practitioner. Low reproducibility studies are those which merely describe algorithms, medium reproducibility studies are those which provide the code and data but not the computational environment in which the code can be run, and high reproducibility studies are those which provide the code, data, and full computational environment necessary to reproduce the results of the study. We identify some exemplars of each of these types of reproducibility from the machine learning literature, motivate the case for high reproducibility studies, and discuss concrete tools and strategies for researchers who wish to ensure easy adoption of their methods by practitioners.