Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences

BackgroundA major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels.ResultsWe propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic classifier MetaPhyler (http://metaphyler.cbcb.umd.edu), which uses phylogenetic marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results by analyzing a real metagenomic dataset.ConclusionsWe have introduced a novel taxonomic classification method for analyzing the microbial diversity from whole-metagenome shotgun sequences. Compared with previous approaches, MetaPhyler is much more accurate in estimating the phylogenetic composition. In addition, we have shown that MetaPhyler can be used to guide the discovery of novel organisms from metagenomic samples.

[1]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[2]  J. Gordon,et al.  Commensal Host-Bacterial Relationships in the Gut , 2001, Science.

[3]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[4]  J. Handelsman Metagenomics: Application of Genomics to Uncultured Microorganisms , 2004, Microbiology and Molecular Biology Reviews.

[5]  Edward M. Rubin,et al.  Metagenomics: DNA sequencing of environmental samples , 2005, Nature Reviews Genetics.

[6]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.

[7]  S. Tringe,et al.  Quantitative Phylogenetic Assessment of Microbial Communities in Diverse Environments , 2007, Science.

[8]  I. Rigoutsos,et al.  Accurate phylogenetic classification of variable-length DNA fragments , 2007, Nature Methods.

[9]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[10]  J. Tiedje,et al.  Naïve Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy , 2007, Applied and Environmental Microbiology.

[11]  Naryttza N. Diaz,et al.  Phylogenetic classification of short environmental DNA fragments , 2008, Nucleic acids research.

[12]  Philip Hugenholtz,et al.  A renaissance for the pioneering 16S rRNA gene. , 2008, Current opinion in microbiology.

[13]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[14]  R. Knight,et al.  Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. , 2009, Genome research.

[15]  Mihai Pop,et al.  Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples , 2009, PLoS Comput. Biol..

[16]  M. Crowell,et al.  Human gut microbiota in obesity and after gastric bypass , 2009, Proceedings of the National Academy of Sciences.

[17]  S. Salzberg,et al.  Phymm and PhymmBL: Metagenomic Phylogenetic Classification with Interpolated Markov Models , 2009, Nature Methods.

[18]  Wolfgang Gerlach,et al.  WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads , 2009, BMC Bioinformatics.

[19]  B. Roe,et al.  A core gut microbiome in obese and lean twins , 2008, Nature.

[20]  Mihai Pop,et al.  Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences , 2011, Genome Biology.