An Effective Git And Org-Mode Based Workflow For Reproducible Research

In this paper we address the question of developing a lightweight and effective workflow for conducting experimental research on modern parallel computer systems in a reproducible way. Our approach builds on two well-known tools (Git and Org-mode) and enables to address, at least partially, issues such as running experiments, provenance tracking, experimental setup reconstruction or replicable analysis. We have been using such a methodology for two years now and it enabled us to recently publish a fully reproducible article [12]. To fully demonstrate the effectiveness of our proposal, we have opened our two year laboratory notebook with all the attached experimental data. This notebook and the underlying Git revision control system enable to illustrate and to better understand the workflow we used.

[1]  Joseph Emeras,et al.  Reproducible Software Appliances for Experimentation , 2014, TRIDENTCOM.

[2]  Jens Gustedt,et al.  A Workflow-Inspired, Modular and Robust Approach to Experiments in Distributed Systems , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[3]  Olivier Richard,et al.  Managing Large Scale Experiments in Distributed Testbeds , 2013 .

[4]  Jonathan Rouzaud-Cornabas,et al.  Using the EXECO Toolkit to Perform Automatic and Reproducible Cloud Experiments , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.

[5]  Terry V. Benzel,et al.  The DETER project: Advancing the science of cyber security experimentation and test , 2010, 2010 IEEE International Conference on Technologies for Homeland Security (HST).

[6]  Cédric Augonnet,et al.  StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..

[7]  Dan Davison,et al.  A Multi-Language Computing Environment for Literate Programming and Reproducible Research , 2012 .

[8]  C. Drummond Replicability is not Reproducibility:Nor is it Good Science , 2009 .

[9]  Legrand Arnaud,et al.  Companion of the StarPU+SimGrid article , 2014 .

[10]  Matthias Hauswirth,et al.  Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.

[11]  Jean-François Méhaut,et al.  Modeling and Simulation of a Dynamic Task-Based Runtime System for Heterogeneous Multi-core Architectures , 2014, Euro-Par.

[12]  Jean-François Méhaut,et al.  Performance analysis of HPC applications on low-power embedded platforms , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[13]  Henri Casanova,et al.  Versatile, scalable, and accurate simulation of distributed applications and platforms , 2014, J. Parallel Distributed Comput..

[14]  Victoria Stodden,et al.  Implementing Reproducible Research , 2018 .

[15]  Konrad Hinsen,et al.  A data and code model for reproducible research and executable papers , 2011, ICCS.