Distributed under Creative Commons Cc-by 4.0 Rhea: a Transparent and Modular R Pipeline for Microbial Profiling Based on 16s Rrna Gene Amplicons

The importance of 16S rRNA gene amplicon profiles for understanding the influence of microbes in a variety of environments coupled with the steep reduction in sequencing costs led to a surge of microbial sequencing projects. The expanding crowd of scientists and clinicians wanting to make use of sequencing datasets can choose among a range of multipurpose software platforms, the use of which can be intimidating for non-expert users. Among available pipeline options for high-throughput 16S rRNA gene analysis, the R programming language and software environment for statistical computing stands out for its power and increased flexibility, and the possibility to adhere to most recent best practices and to adjust to individual project needs. Here we present the Rhea pipeline, a set of R scripts that encode a series of well-documented choices for the downstream analysis of Operational Taxonomic Units (OTUs) tables, including normalization steps, alpha- and beta-diversity analysis, taxonomic composition, statistical comparisons, and calculation of correlations. Rhea is primarily a straightforward starting point for beginners, but can also be a framework for advanced users who can modify and expand the tool. As the community standards evolve, Rhea will adapt to always represent the current state-of-the-art in microbial profiles analysis in the clear and comprehensive way allowed by the R language. Rhea scripts and documentation are freely available at https://lagkouvardos.github.io/Rhea.

[1]  I. Martínez,et al.  Long-Term Temporal Analysis of the Human Fecal Microbiota Revealed a Stable Core of Dominant Bacterial Species , 2013, PloS one.

[2]  Peter R. Minchin,et al.  An evaluation of the relative robustness of techniques for ecological ordination , 1987 .

[3]  H. Daniel,et al.  Gut barrier impairment by high-fat diet in mice depends on housing conditions. , 2016, Molecular nutrition & food research.

[4]  Marti J. Anderson,et al.  A new method for non-parametric multivariate analysis of variance in ecology , 2001 .

[5]  D. Haller,et al.  Gut metabolites and bacterial community networks during a pilot intervention study with flaxseeds in healthy adult men. , 2015, Molecular nutrition & food research.

[6]  Susan P. Holmes,et al.  Waste Not , Want Not : Why Rarefying Microbiome Data is Inadmissible . October 1 , 2013 , 2013 .

[7]  U. Kulozik,et al.  Physiological relevance of food grade microcapsules: Impact of milk protein based microcapsules on inflammation in mouse models for inflammatory bowel diseases. , 2015, Molecular nutrition & food research.

[8]  L. Jost Partitioning diversity into independent alpha and beta components. , 2007, Ecology.

[9]  Karoline Faust,et al.  Millions of reads, thousands of taxa: microbial community structure and associations analyzed via marker genes. , 2016, FEMS microbiology reviews.

[10]  K. Pearson,et al.  DETERMINATION OF THE COEFFICIENT OF CORRELATION. , 1909, Science.

[11]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[12]  W. Bossert,et al.  The Measurement of Diversity , 2001 .

[13]  Robert C. Edgar,et al.  UPARSE: highly accurate OTU sequences from microbial amplicon reads , 2013, Nature Methods.

[14]  M. Horn,et al.  IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies , 2016, Scientific Reports.

[15]  Fionn Murtagh,et al.  Ward’s Hierarchical Agglomerative Clustering Method: Which Algorithms Implement Ward’s Criterion? , 2011, Journal of Classification.

[16]  Susan Holmes,et al.  phyloseq: An R Package for Reproducible Interactive Analysis and Graphics of Microbiome Census Data , 2013, PloS one.

[17]  C. Huttenhower,et al.  The microbiome quality control project: baseline study design and future directions , 2015, Genome Biology.

[18]  R. Gentleman,et al.  Independent filtering increases detection power for high-throughput experiments , 2010, Proceedings of the National Academy of Sciences.

[19]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[20]  Piotr Gawron,et al.  VizBin - an application for reference-independent visualization and human-augmented binning of metagenomic data , 2015, Microbiome.

[21]  Philip H. Ramsey Nonparametric Statistical Methods , 1974, Technometrics.

[22]  L. Jost Entropy and diversity , 2006 .

[23]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[24]  A. Hiergeist,et al.  Microbiome sequencing: challenges and opportunities for molecular medicine , 2016, Expert review of molecular diagnostics.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  R. Knight,et al.  Quantitative and Qualitative β Diversity Measures Lead to Different Insights into Factors That Structure Microbial Communities , 2007, Applied and Environmental Microbiology.

[27]  Brian C. Thomas,et al.  Microbes in the neonatal intensive care unit resemble those found in the gut of premature infants , 2014, Microbiome.

[28]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[29]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[30]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[31]  Scot E. Dowd,et al.  Inherent bacterial DNA contamination of extraction and sequencing reagents may affect interpretation of microbiota in low bacterial biomass samples , 2016, Gut Pathogens.

[32]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[33]  M. Kendall Statistical Methods for Research Workers , 1937, Nature.

[34]  Hongzhe Li,et al.  Associating microbiome composition with environmental covariates using generalized UniFrac distances , 2012, Bioinform..

[35]  Susan P. Holmes,et al.  Shiny-phyloseq: Web application for interactive microbiome analysis with provenance tracking , 2014, Bioinform..

[36]  Christopher J Cates,et al.  Simpson's paradox and calculation of number needed to treat from meta-analysis , 2002, BMC medical research methodology.

[37]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[38]  Jawed Alam,et al.  Helicobacter pylori strains harboring babA2 from Indian sub population are associated with increased virulence in ex vivo study , 2016, Gut Pathogens.

[39]  A. Gessner,et al.  Multicenter quality assessment of 16S ribosomal DNA-sequencing for microbiome analyses reveals high inter-center variability. , 2016, International journal of medical microbiology : IJMM.

[40]  Falk Hildebrand,et al.  Erratum to: LotuS: an efficient and user-friendly OTU processing pipeline , 2014, Microbiome.

[41]  Karen P. Scott,et al.  16S rRNA gene-based profiling of the human infant gut microbiota is strongly influenced by sample processing and PCR primer choice , 2015, Microbiome.

[42]  Martin von Bergen,et al.  Dysbiotic gut microbiota causes transmissible Crohn's disease-like ileitis independent of failure in antimicrobial defence , 2015, Gut.

[43]  Peter R. Minchin,et al.  An evaluation of the relative robustness of techniques for ecological ordination , 1987, Vegetatio.

[44]  R. Feise Do multiple outcome measures require p-value adjustment? , 2002, BMC medical research methodology.