Automated group assignment in large phylogenetic trees using GRUNT: GRouping, Ungrouping, Naming Tool

BackgroundAccurate taxonomy is best maintained if species are arranged as hierarchical groups in phylogenetic trees. This is especially important as trees grow larger as a consequence of a rapidly expanding sequence database. Hierarchical group names are typically manually assigned in trees, an approach that becomes unfeasible for very large topologies.ResultsWe have developed an automated iterative procedure for delineating stable (monophyletic) hierarchical groups to large (or small) trees and naming those groups according to a set of sequentially applied rules. In addition, we have created an associated ungrouping tool for removing existing groups that do not meet user-defined criteria (such as monophyly). The procedure is implemented in a program called GRUNT (GRouping, Ungrouping, Naming Tool) and has been applied to the current release of the Greengenes (Hugenholtz) 16S rRNA gene taxonomy comprising more than 130,000 taxa.ConclusionGRUNT will facilitate researchers requiring comprehensive hierarchical grouping of large tree topologies in, for example, database curation, microarray design and pangenome assignments. The application is available at the greengenes website [1].

[1]  Gary L. Andersen,et al.  High-Density Universal 16S rRNA Microarray Analysis Reveals Broader Diversity than Typical Clone Library When Sampling the Environment , 2007, Microbial Ecology.

[2]  Eoin L. Brodie,et al.  Greengenes, a Chimera-Checked 16S rRNA Gene Database and Workbench Compatible with ARB , 2006, Applied and Environmental Microbiology.

[3]  Eoin L. Brodie,et al.  Urban aerosols harbor diverse and dynamic bacterial populations , 2007, Proceedings of the National Academy of Sciences.

[4]  C. Woese,et al.  Bacterial evolution , 1987, Microbiological reviews.

[5]  Eoin L Brodie,et al.  Application of a High-Density Oligonucleotide Microarray Approach To Study Bacterial Population Dynamics during Uranium Reduction and Reoxidation , 2006, Applied and Environmental Microbiology.

[6]  K. Schleifer,et al.  ARB: a software environment for sequence data. , 2004, Nucleic acids research.

[7]  Derrick J. Zwickl Genetic algorithm approaches for the phylogenetic analysis of large biological sequence datasets under the maximum likelihood criterion , 2006 .

[8]  Alexandros Stamatakis,et al.  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models , 2006, Bioinform..

[9]  Eoin L. Brodie,et al.  Loss of Bacterial Diversity during Antibiotic Treatment of Intubated Patients Colonized with Pseudomonas aeruginosa , 2006, Journal of Clinical Microbiology.

[10]  W. Ludwig,et al.  SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB , 2007, Nucleic acids research.

[11]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[12]  M Weizenegger,et al.  Bacterial phylogeny based on comparative sequence analysis (review) , 1998, Electrophoresis.

[13]  John P. Huelsenbeck,et al.  MrBayes 3: Bayesian phylogenetic inference under mixed models , 2003, Bioinform..