Reproducible Research Workflow in R for the Analysis of Personalized Human Microbiome Data

This article presents a reproducible research workflow for amplicon-based microbiome studies in personalized medicine created using Bioconductor packages and the knitr markdown interface.We show that sometimes a multiplicity of choices and lack of consistent documentation at each stage of the sequential processing pipeline used for the analysis of microbiome data can lead to spurious results. We propose its replacement with reproducible and documented analysis using R packages dada2, knitr, and phyloseq. This workflow implements both key stages of amplicon analysis: the initial filtering and denoising steps needed to construct taxonomic feature tables from error-containing sequencing reads (dada2), and the exploratory and inferential analysis of those feature tables and associated sample metadata (phyloseq). This workow facilitates reproducible interrogation of the full set of choices required in microbiome studies. We present several examples in which we leverage existing packages for analysis in a way that allows easy sharing and modification by others, and give pointers to articles that depend on this reproducible workflow for the study of longitudinal and spatial series analyses of the vaginal microbiome in pregnancy and the oral microbiome in humans with healthy dentition and intra-oral tissues.

[1]  L. Christophorou Science , 2018, Emerging Dynamics: Science, Energy, Society and Values.

[2]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[3]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[4]  Ian T. Paulsen,et al.  Environmental Microbiology , 2022, Methods in Molecular Biology.

[5]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[6]  Genbank,et al.  APPLIED AND ENVIRONMENTAL MICROBIOLOGY , 2008, Applied and Environmental Microbiology.

[7]  Susan Holmes,et al.  Multivariate data analysis: The French way , 2008, 0805.2879.

[8]  O. Bagasra,et al.  Proceedings of the National Academy of Sciences , 1914, Science.

[9]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[10]  A. Dunker The pacific symposium on biocomputing , 1998 .

[11]  Victor H Hernandez,et al.  Nature Methods , 2007 .

[12]  J. Mattick Genome research , 1990, Nature.