Popper : Making Reproducible Systems Performance Evaluation Practical true

—Independent validation of experimental results in the field of parallel and distributed systems research is a challenging task, mainly due to changes and differences in software and hardware in computational environments. Recreating an environment that resembles the original systems research is difficult and time-consuming. In this paper we introduce the Popper Convention, a set of principles for producing scientific publications. Concretely, we make the case for treating an article as an open source software (OSS) project, applying software engineering best-practices to manage its associated artifacts and maintain the reproducibility of its findings. Leveraging existing cloud-computing infrastructure and modern OSS development tools to produce academic articles that are easy to validate. We present our prototype file system, GassyFS, as a use case for illustrating the usefulness of this approach. We show how, by following Popper, re-executing experiments on multiple platforms is more practical, allowing reviewers and students to quickly get to the point of getting results without relying on the author's intervention.

[1]  Dennis Shasha,et al.  A collaborative approach to computational reproducibility , 2016, Inf. Syst..

[2]  Andrea C. Arpaci-Dusseau,et al.  Tackling the reproducibility problem in storage systems research with declarative experiment specifications , 2015, PDSW '15.

[3]  Garth A. Gibson,et al.  PRObE: A Thousand-Node Experimental Cluster for Computer Systems Research , 2013, login Usenix Mag..

[4]  Ariel Deardorff,et al.  Open Science Framework (OSF) , 2017, Journal of the Medical Library Association : JMLA.

[5]  Carl Boettiger,et al.  An introduction to Docker for reproducible research, with examples from the R environment , 2014, ArXiv.

[6]  Aditya G. Parameswaran,et al.  DataHub: Collaborative Data Science & Dataset Version Management at Scale , 2014, CIDR.

[7]  Victoria Stodden,et al.  ResearchCompendia.org: Cyberinfrastructure for Reproducibility and Collaboration in Computational Science , 2015, Computing in Science & Engineering.

[8]  Gerhard Klimeck,et al.  nanoHUB.org: Advancing Education and Research in Nanotechnology , 2008, Computing in Science & Engineering.

[9]  Rafael Hector Saavedra-Barrera,et al.  CPU performance evaluation and execution time prediction using narrow spectrum benchmarking , 1992 .

[10]  John M. Mulvey,et al.  On Reporting Computational Experiments with Mathematical Software , 1979, TOMS.

[11]  Ian P. Gent The Recomputation Manifesto , 2013, ArXiv.

[12]  Robert Karl,et al.  Holistic configuration management at Facebook , 2015, SOSP.

[13]  Bronis R. de Supinski,et al.  The Spack package manager: bringing order to HPC software chaos , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[14]  Eric Eide,et al.  Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..

[15]  Ian M. Mitchell,et al.  Reproducible research for scientific computing: Tools and strategies for changing the culture , 2012, Computing in Science & Engineering.

[16]  Dennis Shasha,et al.  ReproZip: Using Provenance to Support Computational Reproducibility , 2013, TaPP.

[17]  James P. Ignizio,et al.  On the Establishment of Standards for Comparing Algorithm Performance , 1971 .

[18]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[19]  Erez Zadok,et al.  Terra Incognita: On the Practicality of User-Space File Systems , 2015, HotStorage.

[20]  Andrea C. Arpaci-Dusseau,et al.  Characterizing and Reducing Cross-Platform Performance Variability Using OS-Level Virtualization , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[21]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[22]  Torsten Hoefler,et al.  Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .

[23]  Gary King,et al.  An Introduction to the Dataverse Network as an Infrastructure for Data Sharing , 2007 .

[24]  Andres Löh,et al.  NixOS: a purely functional Linux distribution , 2008, ICFP.

[25]  Andrea C. Arpaci-Dusseau,et al.  I Aver: Providing Declarative Experiment Specifications Facilitates the Evaluation of Computer Systems Research , 2016, Tiny Trans. Comput. Sci..

[26]  Dennis Shasha,et al.  A model project for reproducible papers: critical temperature for the Ising model on a square lattice , 2014, ArXiv.

[27]  James P. Ignizio,et al.  Letter to the Editor - Validating Claims for Algorithms Proposed for Publication , 1973, Oper. Res..

[28]  Cees T. A. M. de Laat,et al.  Toward Executable Scientific Publications , 2011, ICCS.

[29]  Ahmed E. Hassan,et al.  Automated detection of performance regressions using statistical process control techniques , 2012, ICPE '12.

[30]  Philippe Bonnet,et al.  Computational reproducibility: state-of-the-art, challenges, and database research opportunities , 2012, SIGMOD Conference.

[31]  Andrea C. Arpaci-Dusseau,et al.  The Role of Container Technology in Reproducible Computer Systems Research , 2015, 2015 IEEE International Conference on Cloud Engineering.

[32]  Carlos Maltzahn,et al.  GassyFS : An In-Memory File System That Embraces Volatility , 2016 .

[33]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[34]  Eli M. Dow,et al.  Xen and the Art of Repeated Research , 2004, USENIX Annual Technical Conference, FREENIX Track.

[35]  Egon L. Willighagen,et al.  Changing computational research. The challenges ahead , 2012, Source Code for Biology and Medicine.

[36]  Carole A. Goble,et al.  The Software Sustainability Institute: Changing Research Software Attitudes and Practices , 2013, Computing in Science & Engineering.