TaxCollector: Modifying Current 16S rRNA Databases for the Rapid Classification at Six Taxonomic Levels

The high level of conservation of 16S ribosomal RNA gene (16S rRNA) in all Prokaryotes makes this gene an ideal tool for the rapid identification and classification of these microorganisms. Databases such as the Ribosomal Database Project II (RDP-II) and the Greengenes Project offer access to sets of ribosomal RNA sequence databases useful in identification of microbes in a culture-independent analysis of microbial communities. However, these databases do not contain all of the taxonomic levels attached to the published names of the bacterial and archaeal sequences. TaxCollector is a set of scripts developed in Python language that attaches taxonomic information to all 16S rRNA sequences in the RDP-II and Greengenes databases. These modified databases are referred to as TaxCollector databases, which when used in conjunction with BLAST allow for rapid classification of sequences from any environmental or clinical source at six different taxonomic levels, from domain to species. The TaxCollector database prepared from the RDP-II database is an important component of a new 16S rRNA pipeline called PANGEA. The usefulness of TaxCollector databases is demonstrated with two very different datasets obtained using samples from a clinical setting and an agricultural soil. The six TaxCollector scripts are freely available on http://taxcollector.sourceforge.net and on http://www.microgator.org.

[1]  R. Knight,et al.  Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex , 2008, Nature Methods.

[2]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[3]  James R. Cole,et al.  The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis , 2004, Nucleic Acids Res..

[4]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[5]  G. Casella,et al.  Pyrosequencing enumerates and contrasts soil microbial diversity , 2007, The ISME Journal.

[6]  Austin G. Davis-Richardson,et al.  PANGEA: pipeline for analysis of next generation amplicons , 2010, The ISME Journal.

[7]  R. Knight,et al.  Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers , 2008, Nucleic acids research.

[8]  James R. Cole,et al.  The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data , 2006, Nucleic Acids Res..

[9]  Jonathan A. Eisen,et al.  Correction: An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) , 2008, PLoS ONE.

[10]  Shankar Subramaniam,et al.  An editor for pathway drawing and data visualization in the Biopathways Workbench , 2009, BMC Systems Biology.

[11]  Monzoorul Haque Mohammed,et al.  SOrt-ITEMS: Sequence orthology based approach for improved taxonomic estimation of metagenomic sequences , 2009, Bioinform..

[12]  G. Casella,et al.  Culture-independent identification of gut bacteria correlated with the onset of diabetes in a rat model , 2009, The ISME Journal.

[13]  Philip Hugenholtz,et al.  A renaissance for the pioneering 16S rRNA gene. , 2008, Current opinion in microbiology.

[14]  John Bunge,et al.  Predicting microbial species richness. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Andrei N Lupas,et al.  PhyloGenie: automated phylome generation and analysis. , 2004, Nucleic acids research.

[16]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[17]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[18]  G. Casella,et al.  Influence of Fecal Sample Storage on Bacterial Community Diversity , 2009, The open microbiology journal.

[19]  J. Eisen,et al.  An Automated Phylogenetic Tree-Based Small Subunit rRNA Taxonomy and Alignment Pipeline (STAP) , 2008, PloS one.