Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context

Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a "scientific test" as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.

[1]  Jonathan M. Borwein,et al.  Opinion: set the default to "open" , 2013 .

[2]  Darko Marinov,et al.  Trade-offs in continuous integration: assurance, security, and flexibility , 2017, ESEC/SIGSOFT FSE.

[3]  Victoria Stodden,et al.  Resolving Irreproducibility in Empirical and Computational Research , 2013 .

[4]  Victoria Stodden,et al.  Enabling the Verification of Computational Results: An Empirical Evaluation of Computational Reproducibility , 2018, Proceedings of the First International Workshop on Practical Reproducible Evaluation of Computer Systems.

[5]  Francine Berman,et al.  Realizing the potential of data science , 2018, Commun. ACM.

[6]  Robert R. Downs,et al.  Community Recommendations for Sustainable Scientific Software , 2015 .

[7]  Christopher J. Roy,et al.  Verification and Validation in Scientific Computing , 2010 .

[8]  Grady Booch,et al.  Object-Oriented Design with Applications , 1990 .

[9]  J. Ioannidis Contradicted and initially stronger effects in highly cited clinical research. , 2005, JAMA.

[10]  Kyle Cranmer,et al.  Yadage and Packtivity – analysis preservation using parametrized workflows , 2017, 1706.01878.

[11]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[12]  Arian Maleki,et al.  Reproducible Research in Computational Harmonic Analysis , 2009, Computing in Science & Engineering.

[13]  M. J. Rubio,et al.  A modification of Newton's method for nondifferentiable equations , 2004 .

[14]  Ganesh Gopalakrishnan,et al.  Determinism and Reproducibility in Large-Scale HPC Systems , 2013 .

[15]  Keith A. Crandall,et al.  Lost Branches on the Tree of Life , 2013, PLoS biology.

[16]  Andrea C. Arpaci-Dusseau,et al.  Standing on the Shoulders of Giants by Managing Scientific Experiments Like Software , 2016, login Usenix Mag..

[17]  K. Cranmer,et al.  RECAST — extending the impact of existing analyses , 2010, 1010.2506.

[18]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[19]  Morgan Taschuk,et al.  Ten simple rules for making research software more robust , 2016, PLoS Comput. Biol..

[20]  Qi Wang,et al.  A conservative Fourier pseudo-spectral method for the nonlinear Schrödinger equation , 2017, J. Comput. Phys..

[21]  Daniel S. Katz,et al.  Report on the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4) , 2017, ArXiv.

[22]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[23]  Darko Marinov,et al.  Usage, costs, and benefits of continuous integration in open-source projects , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[24]  Samin Ishtiaq,et al.  Reproducibility in Research: Systems, Infrastructure, Culture , 2015 .

[25]  Paolo Di Tommaso,et al.  Nextflow enables reproducible computational workflows , 2017, Nature Biotechnology.

[26]  Dennis Shasha,et al.  ReproZip: Computational Reproducibility With Ease , 2016, SIGMOD Conference.

[27]  Matthew J. Turk,et al.  Capturing the "Whole Tale" of Computational Research: Reproducibility in Computing Environments , 2016, ArXiv.

[28]  Division on Earth,et al.  Reproducibility and Replicability in Science , 2019 .

[29]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[30]  Victoria Stodden,et al.  Implementing Reproducible Research , 2018 .

[31]  Victoria Stodden,et al.  An empirical analysis of journal policy effectiveness for computational reproducibility , 2018, Proceedings of the National Academy of Sciences.

[32]  Jonathan M. Borwein,et al.  Setting the Default to Reproducible Reproducibility in Computational and Experimental Mathematics , 2013 .

[33]  Eran Treister,et al.  A fast marching algorithm for the factored eikonal equation , 2016, J. Comput. Phys..

[34]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[35]  Jon F. Claerbout,et al.  Electronic documents give reproducible research a new meaning: 62nd Ann , 1992 .

[36]  Victoria Stodden,et al.  Making massive computational experiments painless , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[37]  Lorena A. Barba,et al.  Terminologies for Reproducible Research , 2018, ArXiv.

[38]  Lorena A. Barba Praxis of Reproducible Computational Science , 2019, Computing in Science & Engineering.

[39]  Victoria Stodden,et al.  Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research , 2014 .

[40]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[41]  Andrea C. Arpaci-Dusseau,et al.  PopperCI: Automated reproducibility validation , 2017, 2017 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).