Distributed Reproducible Research Using Cached Computations

The ability to make scientific findings reproducible is increasingly important. The authors describe a simple framework in which scientists can perform and distribute reproducible research via cached computations. This article describes a prototype implementation as well as a case study application.

[1]  David L. Donoho,et al.  WaveLab and Reproducible Research , 1995 .

[2]  Wolfgang Huber,et al.  A Compendium to Ensure Computational Reproducibility in High-Dimensional Classification Tasks , 2004, Statistical applications in genetics and molecular biology.

[3]  Norman Ramsey,et al.  Literate programming simplified , 1994, IEEE Software.

[4]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[5]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[6]  Matthias Schwab,et al.  Making scientific computations reproducible , 2000, Comput. Sci. Eng..

[7]  A. J. Rossini,et al.  Emacs Speaks Statistics: A Multiplatform, Multipackage Development Environment for Statistical Analysis , 2004 .

[8]  Robert Gentleman,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[9]  Roger D. Peng,et al.  INTERACTING WITH DATA USING THE FILEHASH PACKAGE FOR R , 2006 .

[10]  Roger Newson,et al.  Confidence Intervals and p-values for Delivery to the End User , 2003 .

[11]  Michael J Daniels,et al.  The National Morbidity, Mortality, and Air Pollution Study. Part III: PM10 concentration-response curves and thresholds for the 20 largest US cities. , 2004, Research report.

[12]  Ross Ihaka,et al.  Gentleman R: R: A language for data analysis and graphics , 1996 .

[13]  Sergey Fomel,et al.  Reproducible Computational Experiments using Scons , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[14]  Rosa Gini,et al.  Automatic Generation of Documents , 2006 .

[15]  Michelle L. Bell,et al.  A Meta-Analysis of Time-Series Studies of Ozone and Mortality With Comparison to the National Morbidity, Mortality, and Air Pollution Study , 2005, Epidemiology.

[16]  K. Coombes,et al.  Microarrays: retracing steps , 2007, Nature Medicine.

[17]  S L Zeger,et al.  The National Morbidity, Mortality, and Air Pollution Study. Part I: Methods and methodologic issues. , 2000, Research report.

[18]  Roger D. Peng,et al.  Caching and Distributing Statistical Analyses in R , 2008 .

[19]  Friedrich Leisch,et al.  Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis , 2002, COMPSTAT.

[20]  Jeffrey S. Morris,et al.  Signal in noise: evaluating reported reproducibility of serum proteomic tests for ovarian cancer. , 2005, Journal of the National Cancer Institute.

[21]  Sandrah P. Eckel,et al.  Interacting with local and remote data repositories using the stashR package , 2009, Comput. Stat..

[22]  F. Dominici,et al.  Reproducible epidemiologic research. , 2006, American journal of epidemiology.

[23]  Robert Gentleman,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[24]  S. Goodman,et al.  Reproducible Research: Moving toward Research the Public Can Really Trust , 2007, Annals of Internal Medicine.

[25]  A. J. Rossini,et al.  Literate Statistical Practice , 2003 .

[26]  F. Dominici,et al.  Fine particulate air pollution and mortality in 20 U.S. cities, 1987-1994. , 2000, The New England journal of medicine.

[27]  J. Schwartz,et al.  The National Morbidity, Mortality, and Air Pollution Study. Part II: Morbidity and mortality from air pollution in the United States. , 2000, Research report.

[28]  Donald E. Knuth,et al.  Literate Programming , 1984, Comput. J..

[29]  Günther Sawitzki,et al.  Keeping Statistics Alive in Documents , 2002, Comput. Stat..

[30]  J. Sarnat,et al.  Fine particulate air pollution and mortality in 20 U.S. cities. , 2001, The New England journal of medicine.

[31]  T. Louis,et al.  Model choice in time series studies of air pollution and mortality , 2006 .

[32]  C. Morris,et al.  Inference for multivariate normal hierarchical models , 2000 .

[33]  F. Dominici,et al.  Seasonal analyses of air pollution and mortality in 100 US cities. , 2005, American journal of epidemiology.