MUSiCC: a marker genes based framework for metagenomic normalization and accurate profiling of gene abundances in the microbiome

Functional metagenomic analyses commonly involve a normalization step, where measured levels of genes or pathways are converted into relative abundances. Here, we demonstrate that this normalization scheme introduces marked biases both across and within human microbiome samples, and identify sample- and gene-specific properties that contribute to these biases. We introduce an alternative normalization paradigm, MUSiCC, which combines universal single-copy genes with machine learning methods to correct these biases and to obtain an accurate and biologically meaningful measure of gene abundances. Finally, we demonstrate that MUSiCC significantly improves downstream discovery of functional shifts in the microbiome.MUSiCC is available at http://elbo.gs.washington.edu/software.html.

[1]  A. Goldfine,et al.  The cellular fate of glucose and its relevance in type 2 diabetes. , 2004, Endocrine reviews.

[2]  Stephan Frickenhaus,et al.  Average genome size: a potential source of bias in comparative metagenomics , 2010, The ISME Journal.

[3]  J. Clemente,et al.  Human gut microbiome viewed across age and geography , 2012, Nature.

[4]  Curtis Huttenhower,et al.  Functional and phylogenetic assembly of microbial communities in the human microbiome. , 2014, Trends in microbiology.

[5]  Frederic D Bushman,et al.  Rapid evolution of the human gut virome , 2013, Proceedings of the National Academy of Sciences.

[6]  Sharon I. Greenblum,et al.  Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease , 2011, Proceedings of the National Academy of Sciences.

[7]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[8]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[9]  Allyson L. Byrd,et al.  Biogeography and individuality shape function in the human skin metagenome , 2014, Nature.

[10]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[11]  Lawrence A. David,et al.  Diet rapidly and reproducibly alters the human gut microbiome , 2013, Nature.

[12]  Paul M. Ruegger,et al.  Integrative analysis of the microbiome and metabolome of the human intestinal mucosal surface reveals exquisite inter-relationships , 2013, Microbiome.

[13]  Bernard Henrissat,et al.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome , 2012, PLoS Comput. Biol..

[14]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[15]  Forest Rohwer,et al.  The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes , 2009, PLoS Comput. Biol..

[16]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[17]  G. Barlow,et al.  Methane-producing human subjects have higher serum glucose levels during oral glucose challenge than non-methane producers: a pilot study of the effects of enteric methanogens on glycemic regulation , 2014 .

[18]  S. Sørensen,et al.  Quantitative Metagenomic Analyses Based on Average Genome Size Normalization , 2011, Applied and Environmental Microbiology.

[19]  A. Hattersley,et al.  Linkage of type 2 diabetes to the glucokinase gene , 1992, The Lancet.

[20]  P. Bork,et al.  Prediction of effective genome size in metagenomic samples , 2007, Genome Biology.

[21]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[22]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[23]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[24]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[25]  M. Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[26]  Frederick Albert Matsen IV,et al.  PhyloSift: phylogenetic analysis of genomes and metagenomes , 2014, PeerJ.

[27]  Jesse R. Zaneveld,et al.  Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences , 2013, Nature Biotechnology.

[28]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[29]  G. Weinstock,et al.  Metagenomic analysis of double-stranded DNA viruses in healthy adults , 2014, BMC Biology.

[30]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[31]  J. Ferrières,et al.  Metabolic Endotoxemia Initiates Obesity and Insulin Resistance , 2007, Diabetes.

[32]  Elhanan Borenstein,et al.  Comparative Analysis of Functional Metagenomic Annotation and the Mappability of Short Reads , 2014, PloS one.

[33]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[34]  Timothy L. Tickle,et al.  Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment , 2012, Genome Biology.

[35]  P. Bork,et al.  Richness of human gut microbiome correlates with metabolic markers , 2013, Nature.