Metagenomic abundance estimation and diagnostic testing on species level

One goal of sequencing-based metagenomic community analysis is the quantitative taxonomic assessment of microbial community compositions. In particular, relative quantification of taxons is of high relevance for metagenomic diagnostics or microbial community comparison. However, the majority of existing approaches quantify at low resolution (e.g. at phylum level), rely on the existence of special genes (e.g. 16S), or have severe problems discerning species with highly similar genome sequences. Yet, problems as metagenomic diagnostics require accurate quantification on species level. We developed Genome Abundance Similarity Correction (GASiC), a method to estimate true genome abundances via read alignment by considering reference genome similarities in a non-negative LASSO approach. We demonstrate GASiC’s superior performance over existing methods on simulated benchmark data as well as on real data. In addition, we present applications to datasets of both bacterial DNA and viral RNA source. We further discuss our approach as an alternative to PCR-based DNA quantification.

[1]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[2]  Li C. Xia,et al.  Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads , 2011, PloS one.

[3]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[4]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[5]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[6]  A. Branch,et al.  The quasispecies nature and biological implications of the hepatitis C virus. , 2009, Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases.

[7]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.

[8]  Manuel Holtgrewe,et al.  Mason – A Read Simulator for Second Generation Sequencing Data , 2010 .

[9]  M. Pop,et al.  Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences , 2011, BMC Genomics.

[10]  Alexander F. Auch,et al.  MEGAN analysis of metagenomic data. , 2007, Genome research.

[11]  Bernhard Y. Renard,et al.  NITPICK: peak identification for mass spectrometry data , 2008, BMC Bioinformatics.

[12]  Thomas Wetter,et al.  Genome Sequence Assembly Using Trace Signals and Additional Sequence Information , 1999, German Conference on Bioinformatics.

[13]  Jillian F. Banfield,et al.  Community genomics in microbial ecology and evolution , 2005, Nature Reviews Microbiology.

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  Gesine Reinert,et al.  Alignment-Free Sequence Comparison (I): Statistics and Power , 2009, J. Comput. Biol..

[16]  Forest Rohwer,et al.  The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes , 2009, PLoS Comput. Biol..

[17]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[18]  Aleksey Jironkin,et al.  Recombinants between Deformed wing virus and Varroa destructor virus-1 may prevail in Varroa destructor-infested honeybee colonies. , 2011, The Journal of general virology.