Statistical Analyses and Reproducible Research

It is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, and so on with the documents that describe and rely on them. This integration allows readers to both verify and adapt the claims in the documents. Authors can easily reproduce the results in the future, and they can present the document's contents in a different medium, for example, with interactive controls. This article describes a software framework for both authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents—including figures, tables, and so on—can be recalculated each time a view of the document is generated. Our model treats a dynamic document as a master or “source” document from which one can generate different views in the form of traditional, derived documents for different audiences. We introduce the concept of a compendium as a container for one or more dynamic documents and the different elements needed when processing them, such as code and data. The compendium serves as a means for distributing, managing, and updating the collection. The step from disseminating analyses via a compendium to reproducible research is a small one. By reproducible research, we mean research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the computational methods that are presented in the research paper. Some of the issues involved in paradigms for the production, distribution, and use of such reproducible research are discussed.