phyloseq: A Bioconductor Package for Handling and Analysis of High-Throughput Phylogenetic Sequence Data

We present a detailed description of a new Bioconductor package, phyloseq, for integrated data and analysis of taxonomically-clustered phylogenetic sequencing data in conjunction with related data types. The phyloseq package integrates abundance data, phylogenetic information and covariates so that exploratory transformations, plots, and confirmatory testing and diagnostic plots can be carried out seamlessly. The package is built following the S4 object-oriented framework of the R language so that once the data have been input the user can easily transform, plot and analyze the data. We present some examples that highlight the methods and the ease with which we can leverage existing packages.

[1]  D. Chessel,et al.  From dissimilarities among species to dissimilarities among communities: a double principal coordinate analysis. , 2004, Journal of theoretical biology.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  R. Knight,et al.  Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data , 2009, The ISME Journal.

[4]  Campbell O. Webb,et al.  Picante: R tools for integrating phylogenies and ecology , 2010, Bioinform..

[5]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[6]  Susan P. Holmes,et al.  Comparisons of Distance Methods for Combining Covariates and Abundances in Microbiome Studies , 2011, Pacific Symposium on Biocomputing.

[7]  Hadley Wickham,et al.  ggplot2 - Elegant Graphics for Data Analysis (2nd Edition) , 2017 .

[8]  W. T.,et al.  Plant Ecology , 1956, Nature.

[9]  A. Dunker The pacific symposium on biocomputing , 1998 .

[10]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[11]  Victor H Hernandez,et al.  Nature Methods , 2007 .

[12]  William A. Walters,et al.  Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample , 2010, Proceedings of the National Academy of Sciences.

[13]  John M. Chambers,et al.  Software for data analysis , 2008 .

[14]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[15]  김삼묘,et al.  “Bioinformatics” 특집을 내면서 , 2000 .

[16]  Genbank,et al.  APPLIED AND ENVIRONMENTAL MICROBIOLOGY , 2008, Applied and Environmental Microbiology.

[17]  P. Bork,et al.  Enterotypes of the human gut microbiome , 2011, Nature.

[18]  Kevin J. Gaston,et al.  Measuring beta diversity for presence–absence data , 2003 .

[19]  BMC Bioinformatics , 2005 .