Advantages and Limits in the Adoption of Reproducible Research and R-Tools for the Analysis of Omic Data

Reproducible (computational) Research is crucial to produce transparent and high quality scientific papers. First, we illustrate the benefits that scientific community can receive from the adoption of Reproducible Research standards in the analysis of high-throughput omic data. Then, we describe several tools useful to researchers to increase the reproducibility of their works. Moreover, we face the advantages and limits of reproducible research and how they could be addressed and solved. Overall, this paper should be considered as a proof of concept on how and what characteristic - in our opinion - should be considered to conduct a study in the spirit of Reproducible Research. Therefore, the scope of this paper is two-fold. The first goal consists in presenting and discussing some easy-to-use instruments for data analysts to promote reproducible research in their analyses. The second aim is to encourage developers to incorporate automatic reproducibility features in their tools.

[1]  C. Begley,et al.  Drug development: Raise standards for preclinical cancer research , 2012, Nature.

[2]  Jin Billy Li,et al.  Comment on “Widespread RNA and DNA Sequence Differences in the Human Transcriptome” , 2012, Science.

[3]  H. Atmanspacher,et al.  Relevance relations for the concept of reproducibility , 2014, Journal of The Royal Society Interface.

[4]  Torsten Hothorn,et al.  Case studies in reproducibility , 2011, Briefings Bioinform..

[5]  John P. A. Ioannidis,et al.  Reproducible Research Practices and Transparency across the Biomedical Literature , 2016, PLoS biology.

[6]  Darrel C. Ince,et al.  The case for open computer programs , 2012, Nature.

[7]  James Taylor,et al.  Next-generation sequencing data interpretation: enhancing reproducibility and accessibility , 2012, Nature Reviews Genetics.

[8]  C. Ball,et al.  Repeatability of published microarray gene expression analyses , 2009, Nature Genetics.

[9]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[10]  Melanie A. Huntley,et al.  ReportingTools: an automated result processing and presentation toolkit for high-throughput genomic analyses , 2013, Bioinform..

[11]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[12]  Francesco Russo,et al.  Advancements in RNASeqGUI towards a Reproducible Analysis of RNA-Seq Experiments , 2016, BioMed research international.

[13]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[14]  Victoria Stodden,et al.  Implementing Reproducible Research , 2018 .

[15]  Jon F. Claerbout,et al.  Electronic documents give reproducible research a new meaning: 62nd Ann , 1992 .

[16]  Robert Gentleman,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[17]  F. Prinz,et al.  Believe it or not: how much can we rely on published data on potential drug targets? , 2011, Nature Reviews Drug Discovery.

[18]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[19]  Daniel J. Blankenberg,et al.  Galaxy: A Web‐Based Genome Analysis Tool for Experimentalists , 2010, Current protocols in molecular biology.

[20]  Robert Tibshirani,et al.  Scientific research in the age of omics: the good, the bad, and the sloppy , 2013, J. Am. Medical Informatics Assoc..

[21]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[22]  Tomas Babak,et al.  Critical Evaluation of Imprinted Gene Expression by RNA–Seq: A New Perspective , 2012, PLoS genetics.

[23]  Roger D. Peng,et al.  Caching and Distributing Statistical Analyses in R , 2008 .

[24]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[25]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[26]  Seth Falcon Caching code chunks in dynamic documents , 2009, Comput. Stat..

[27]  Claudia Angelini,et al.  RNASeqGUI: a GUI for analysing RNA-Seq data , 2014, Bioinform..

[28]  Anton Nekrutenko,et al.  Using Galaxy to Perform Large‐Scale Interactive Data Analyses , 2007, Current protocols in bioinformatics.

[29]  Roger D. Peng,et al.  INTERACTING WITH DATA USING THE FILEHASH PACKAGE FOR R , 2006 .

[30]  Maren Duvendack,et al.  Replication of quantitative work in development studies: Experiences and suggestions , 2013 .

[31]  Roger D Peng,et al.  Reproducible research and Biostatistics. , 2009, Biostatistics.

[32]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[33]  Benjamin Hofner,et al.  Reproducible research in statistics: A review and guidelines for the Biometrical Journal , 2016, Biometrical journal. Biometrische Zeitschrift.

[34]  Zhifa Liu,et al.  An R package that automatically collects and archives details for reproducible computing , 2014, BMC Bioinformatics.

[35]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[36]  K. Coombes,et al.  Deriving chemosensitivity from cell lines: Forensic bioinformatics and reproducible research in high-throughput biology , 2009, 1010.1092.

[37]  Mingyao Li,et al.  Widespread RNA and DNA Sequence Differences in the Human Transcriptome , 2011, Science.

[38]  Sandrah P. Eckel,et al.  Distributed Reproducible Research Using Cached Computations , 2009, Computing in Science & Engineering.

[39]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[40]  Milton Pividori,et al.  A very simple and fast way to access and validate algorithms in reproducible research , 2016, Briefings Bioinform..