Phoenix 2: a locally installable large-scale 16S rRNA gene sequence analysis pipeline with Web interface.

We have developed Phoenix 2, a ribosomal RNA gene sequence analysis pipeline, which can be used to process large-scale datasets consisting of more than one hundred environmental samples and containing more than one million reads collectively. Rapid handling of large datasets is made possible by the removal of redundant sequences, pre-partitioning of sequences, parallelized clustering per partition, and subsequent merging of clusters. To build the pipeline, we have used a combination of open-source software tools and custom-developed Perl scripts. For our project we utilize hardware-accelerated searches, but it is possible to reconfigure the analysis pipeline for use with generic computing infrastructure only, with a considerable reduction in speed. The set of analysis results produced by Phoenix 2 is comprehensive, including taxonomic annotations using multiple methods, alpha diversity indices, beta diversity measurements, and a number of visualizations. To date, the pipeline has been used to analyze more than 1500 environmental samples from a wide variety of microbial communities, which are part of our Hydrocarbon Metagenomics Project (http://www.hydrocarbonmetagenomics.com). The software package can be installed as a local software suite with a Web interface. Phoenix 2 is freely available from http://sourceforge.net/projects/phoenix2.

[1]  Philip Hugenholtz,et al.  NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes , 2006, Nucleic Acids Res..

[2]  J. T. Curtis,et al.  An Ordination of the Upland Forest Communities of Southern Wisconsin , 1957 .

[3]  J. Foster,et al.  MiCA: A Web-Based Tool for the Analysis of Microbial Communities Based on Terminal-Restriction Fragment Length Polymorphisms of 16S and 18S rRNA Genes , 2007, Microbial Ecology.

[4]  C. Sensen,et al.  Methanogenic toluene metabolism: community structure and intermediates. , 2012, Environmental microbiology.

[5]  Patrick D. Schloss,et al.  Assessing and Improving Methods Used in Operational Taxonomic Unit-Based Approaches for 16S rRNA Gene Sequence Analysis , 2011, Applied and Environmental Microbiology.

[6]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[7]  Daniel H. Huson,et al.  Dendroscope: An interactive viewer for large phylogenetic trees , 2007, BMC Bioinformatics.

[8]  G. Church,et al.  Multiplex DNA sequencing. , 1988, Science.

[9]  E. H. Simpson Measurement of Diversity , 1949, Nature.

[10]  C. Quince,et al.  Accurate determination of microbial diversity from 454 pyrosequencing data , 2009, Nature Methods.

[11]  Russell J. Davenport,et al.  Removing Noise From Pyrosequenced Amplicons , 2011, BMC Bioinformatics.

[12]  Susan M. Huse,et al.  Accuracy and quality of massively parallel DNA pyrosequencing , 2007, Genome Biology.

[13]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[14]  J. Chun,et al.  EzTaxon: a web-based tool for the identification of prokaryotes based on 16S ribosomal RNA gene sequences. , 2007, International journal of systematic and evolutionary microbiology.

[15]  Sanderman Jonathan,et al.  Principal coordinates analysis of management data. , 2015 .

[16]  V. Kunin,et al.  Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. , 2009, Environmental microbiology.

[17]  A. Chao Nonparametric estimation of the number of classes in a population , 1984 .

[18]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[19]  P. Groenen,et al.  Modern Multidimensional Scaling: Theory and Applications , 1999 .

[20]  C. Sensen,et al.  Effect of Sodium Bisulfite Injection on the Microbial Community Composition in a Brackish-Water-Transporting Pipeline , 2011, Applied and Environmental Microbiology.

[21]  C. Sensen,et al.  Microbial community succession in a bioreactor modeling a souring low-temperature oil reservoir subjected to nitrate injection , 2011, Applied Microbiology and Biotechnology.

[22]  R. Knight,et al.  Fast UniFrac: Facilitating high-throughput phylogenetic analyses of microbial communities including analysis of pyrosequencing and PhyloChip data , 2009, The ISME Journal.

[23]  G. Olsen,et al.  Ribosomal RNA: a key to phylogeny , 1993, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[24]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[25]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[26]  R. Knight,et al.  UniFrac: a New Phylogenetic Method for Comparing Microbial Communities , 2005, Applied and Environmental Microbiology.

[27]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[28]  Patrick D Schloss,et al.  Evaluating different approaches that test whether microbial communities have the same structure , 2008, The ISME Journal.

[29]  R. Sokal,et al.  A QUANTITATIVE APPROACH TO A PROBLEM IN CLASSIFICATION† , 1957, Evolution; International Journal of Organic Evolution.

[30]  Robert K. Colwell,et al.  Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness , 2001 .

[31]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[32]  S. Zahl,et al.  JACKKNIFING AN INDEX OF DIVERSITY , 1977 .

[33]  Ruth Ann Luna,et al.  Metagenomic pyrosequencing and microbial identification. , 2009, Clinical chemistry.

[34]  Jack C. Yue,et al.  A Similarity Measure Based on Species Proportions , 2005 .

[35]  C. Sensen,et al.  Compositions of microbial communities associated with oil and water in a mesothermic oil field , 2011, Antonie van Leeuwenhoek.

[36]  L. Excoffier,et al.  Analysis of molecular variance inferred from metric distances among DNA haplotypes: application to human mitochondrial DNA restriction data. , 1992, Genetics.

[37]  Lauren M. Bragg,et al.  Fast, accurate error-correction of amplicon pyrosequences using Acacia , 2012, Nature Methods.

[38]  Martin Hartmann,et al.  Introducing mothur: Open-Source, Platform-Independent, Community-Supported Software for Describing and Comparing Microbial Communities , 2009, Applied and Environmental Microbiology.

[39]  Mihai Pop,et al.  Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples , 2009, PLoS Comput. Biol..

[40]  A. Chao,et al.  Estimating the Number of Classes via Sample Coverage , 1992 .

[41]  C. Sensen,et al.  Methanogenic biodegradation of two-ringed polycyclic aromatic hydrocarbons. , 2012, FEMS microbiology ecology.

[42]  Susan M. Huse,et al.  Microbial diversity in the deep sea and the underexplored “rare biosphere” , 2006, Proceedings of the National Academy of Sciences.

[43]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[44]  R. Knight,et al.  Rapid denoising of pyrosequencing amplicon data: exploiting the rank-abundance distribution , 2010, Nature Methods.

[45]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[46]  William A. Walters,et al.  QIIME allows analysis of high-throughput community sequencing data , 2010, Nature Methods.

[47]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[48]  Patrick D. Schloss,et al.  Reducing the Effects of PCR Amplification and Sequencing Artifacts on 16S rRNA-Based Studies , 2011, PloS one.

[49]  Laurent Excoffier,et al.  Assessing population genetic structure and variability with RAPD data: Application to Vaccinium macrocarpon (American Cranberry) , 1996 .

[50]  James R. Cole,et al.  The Ribosomal Database Project: improved alignments and new tools for rRNA analysis , 2008, Nucleic Acids Res..

[51]  J. Clarridge,et al.  Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases , 2004, Clinical Microbiology Reviews.

[52]  C. Sensen,et al.  Carbon and sulfur cycling by microbial communities in a gypsum-treated oil sands tailings pond. , 2011, Environmental science & technology.

[53]  W. Thomas,et al.  Assessing the Consequences of Denoising Marker-Based Metagenomic Data , 2013, PloS one.