Independently validating experimental results in the field of computer systems research is a challenging task. Recreating an environment that resembles the one where an experiment was originally executed is a time-consuming endeavor. In this article, we present Popper [1], a convention (or protocol) for conducting experiments following a DevOps [2] approach that allows researchers to make all associated artifacts publicly available with the goal of maximizing automation in the re-execution of an experiment and validation of its results. A basic expectation in the practice of the scientific method is to document, archive, and share all data and the methodologies used so other scientists can reproduce and verify scientific results and students can learn how they were derived. However, in the scientific branches of computation and data exploration the lack of reproducibility has led to a credibility crisis. As more scientific disciplines are relying on computational methods and dataintensive exploration, it has become urgent to develop software tools that help document dependencies on data products, methodologies, and computational environments, that safely archive data products and documentation, and that reliably share data products and documentations so that scientists can rely on their availability. Over the last decade software engineering and systems administration communities (also referred to as DevOps) have developed sophisticated techniques and strategies to ensure “software reproducibility,” i.e., the reproducibility of software artifacts and their behavior using versioning, dependency management, containerization, orchestration, monitoring, testing and documentation. The key idea behind the Popper Convention is to manage every experiment in computation and data exploration as a software project, using tools and services that are readily available now and enjoy wide popularity. By doing so, scientific explorations become reproducible with the same convenience, efficiency, and scalability as software reproducibility while fully leveraging continuing improvements to these tools and services. Rather than mandating a particular set of tools, the Convention requires that the tool set as a whole implements functionality necessary for software reproducibility. There are two main goals for Popper: 1. It should be usable in as many research projects as possible, regardless of domain. 2. It should abstract underlying technologies without requiring a strict set of tools, making it possible to apply it on multiple toolchains.
[1]
Victoria Stodden,et al.
ResearchCompendia.org: Cyberinfrastructure for Reproducibility and Collaboration in Computational Science
,
2015,
Computing in Science & Engineering.
[2]
Carlos Maltzahn,et al.
GassyFS : An In-Memory File System That Embraces Volatility
,
2016
.
[3]
Carlos Maltzahn,et al.
Popper : Making Reproducible Systems Performance Evaluation Practical true
,
.
[4]
W. Marsden.
I and J
,
2012
.
[5]
Michael Hüttermann.
DevOps for Developers
,
2012,
Apress.
[6]
Torsten Hoefler,et al.
Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results
,
2017
.
[7]
Christian S. Collberg,et al.
Repeatability in computer systems research
,
2016,
Commun. ACM.