It is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, and so on with the documents that describe and rely on them. This integration allows readers to both verify and adapt the claims in the documents. Authors can easily reproduce the results in the future, and they can present the document's contents in a different medium, for example, with interactive controls. This article describes a software framework for both authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents—including figures, tables, and so on—can be recalculated each time a view of the document is generated. Our model treats a dynamic document as a master or “source” document from which one can generate different views in the form of traditional, derived documents for different audiences. We introduce the concept of a compendium as a container for one or more dynamic documents and the different elements needed when processing them, such as code and data. The compendium serves as a means for distributing, managing, and updating the collection. The step from disseminating analyses via a compendium to reproducible research is a small one. By reproducible research, we mean research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the computational methods that are presented in the research paper. Some of the issues involved in paradigms for the production, distribution, and use of such reproducible research are discussed.
[1]
Robert Gentleman,et al.
Reproducible Research: A Bioinformatics Case Study
,
2005,
Statistical applications in genetics and molecular biology.
[2]
Günther Sawitzki,et al.
Keeping Statistics Alive in Documents
,
2002,
Comput. Stat..
[3]
Duncan Temple Lang,et al.
Embedding S in Other Languages and En vironments
,
2001
.
[4]
Andy Oram,et al.
Managing Projects with Make
,
1993
.
[5]
Norman Ramsey,et al.
Literate programming simplified
,
1994,
IEEE Software.
[6]
Ross Ihaka,et al.
Gentleman R: R: A language for data analysis and graphics
,
1996
.
[7]
Jeffrey S. Morris,et al.
Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments
,
2004,
Bioinform..
[8]
Robert Gentleman,et al.
Statistical Applications in Genetics and Molecular Biology
,
2005
.
[9]
Donald E. Knuth,et al.
Literate Programming
,
1984,
Comput. J..
[10]
Scott Oaks,et al.
Java Security
,
1998
.
[11]
A. J. Rossini,et al.
Emacs Speaks Statistics: A Multiplatform, Multipackage Development Environment for Statistical Analysis
,
2004
.
[12]
C. Pollard,et al.
Center for the Study of Language and Information
,
2022
.
[13]
Duncan J. Murdoch,et al.
On the Edge: Statistics & Computing
,
2001
.
[14]
David L. Donoho,et al.
WaveLab and Reproducible Research
,
1995
.
[15]
Friedrich Leisch,et al.
Sweave: Dynamic Generation of Statistical Reports Using Literate Data Analysis
,
2002,
COMPSTAT.