Experimenting with reproducibility: a case study of robustness in bioinformatics

Abstract Reproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our difficulties in reproducing a published bioinformatics method even though code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the whole method in a Python package to avoid dependency on a MATLAB license and ease the execution of the code on a high-performance computing cluster. Third, we assessed reusability of our reimplementation and the quality of our documentation, testing how easy it would be to start from our implementation to reproduce the results. In a second section, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective levels. While finalizing our code, we created case-specific documentation and tutorials for the associated Python package StratiPy. Readers are invited to experiment with our reproducibility case study by generating the two confusion matrices (see more in section “Robustness: from MATLAB to Python, language and organization"). Here, we propose two options: a step-by-step process to follow in a Jupyter/IPython notebook or a Docker container ready to be built and run.

[1]  Shoaib Sufi,et al.  Toward standard practices for sharing computer code and programs in neuroscience , 2016, Nature Neuroscience.

[2]  Ian M. Mitchell,et al.  Best Practices for Scientific Computing , 2012, PLoS biology.

[3]  H. Singer An Historical Perspective , 1995 .

[4]  Anton Nekrutenko,et al.  Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..

[5]  Carl Boettiger,et al.  An introduction to Docker for reproducible research , 2014, OPSR.

[6]  John D. Blischak,et al.  A Quick Introduction to Version Control with Git and GitHub , 2016, PLoS Comput. Biol..

[7]  Lex Nederbragt,et al.  Good enough practices in scientific computing , 2016, PLoS Comput. Biol..

[8]  Vanessa Sochat,et al.  Singularity: Scientific containers for mobility of compute , 2017, PloS one.

[9]  Yolanda Gil,et al.  Enhancing reproducibility for computational methods , 2016, Science.

[10]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[11]  Thomas C. Herndon,et al.  Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff , 2014 .

[12]  Victoria Stodden,et al.  An empirical analysis of journal policy effectiveness for computational reproducibility , 2018, Proceedings of the National Academy of Sciences.

[13]  Torsten Hothorn,et al.  Case studies in reproducibility , 2011, Briefings Bioinform..

[14]  Tristan Glatard,et al.  Reproducibility of neuroimaging analyses across operating systems , 2015, Front. Neuroinform..

[15]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[16]  Declan G. M. Murphy,et al.  Identification and validation of biomarkers for autism spectrum disorders , 2015, Nature Reviews Drug Discovery.

[17]  T. Bourgeron From the genetic architecture to synaptic plasticity in autism spectrum disorder , 2015, Nature Reviews Neuroscience.

[18]  Sean L. Hill How do we know what we know? Discovering neuroscience data sets through minimal metadata , 2016, Nature Reviews Neuroscience.

[19]  Philip E. Bourne,et al.  Ten simple rules to consider regarding preprint submission , 2017, PLoS Comput. Biol..

[20]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[21]  Matthew Hutson,et al.  Missing data hinder replication of artificial intelligence studies , 2018 .

[22]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[23]  Ariel Deardorff,et al.  Open Science Framework (OSF) , 2017, Journal of the Medical Library Association : JMLA.