Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes

Metagenomics is the study of microorganisms in environmental and clinical samples using high-throughput sequencing of random fragments of their DNA. Since metagenomics does not require any prior culturing of isolates, entire microbial communities can be studied directly in their natural state. In metagenomics, the abundance of genes is quantified by sorting and counting the DNA fragments. The resulting count data are high-dimensional and affected by high levels of technical and biological noise that make the statistical analysis challenging. In this article, we introduce an hierarchical overdispersed Poisson model to explore the variability in metagenomic data. By analyzing three comprehensive data sets, we show that the gene-specific variability varies substantially between genes and is dependent on biological function. We also assess the power of identifying differentially abundant genes and show that incorrect assumptions about the gene-specific variability can lead to unacceptable high rates of false positives. Finally, we evaluate shrinkage approaches to improve the variance estimation and show that the prior choice significantly affects the statistical power. The results presented in this study further elucidate the complex variance structure of metagenomic data and provide suggestions for accurate and reliable identification of differentially abundant genes.

[1]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[2]  M. Pop,et al.  Robust methods for differential abundance analysis in marker gene surveys , 2013, Nature Methods.

[3]  Hao Wu,et al.  A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data , 2012, Biostatistics.

[4]  E. Kristiansson,et al.  Pyrosequencing of Antibiotic-Contaminated River Sediments Reveals High Levels of Resistance and Gene Transfer Elements , 2011, PloS one.

[5]  Charlotte Soneson,et al.  A comparison of methods for differential expression analysis of RNA-seq data , 2013, BMC Bioinformatics.

[6]  D. Ussery,et al.  Comparison of 61 Sequenced Escherichia coli Genomes , 2010, Microbial Ecology.

[7]  R. Knight,et al.  The Human Microbiome Project , 2007, Nature.

[8]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[9]  Léon Personnaz,et al.  Enrichment or depletion of a GO category within a class of genes: which test? , 2007, Bioinform..

[10]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[11]  Rick L. Stevens,et al.  Unlocking the potential of metagenomics through replicated experimental design , 2012, Nature Biotechnology.

[12]  Andrew C. Pawlowski,et al.  The Comprehensive Antibiotic Resistance Database , 2013, Antimicrobial Agents and Chemotherapy.

[13]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[14]  Elhanan Borenstein,et al.  Extensive Strain-Level Copy-Number Variation across Human Gut Microbiome Species , 2015, Cell.

[15]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[16]  Thomas P. Curtis,et al.  Estimating prokaryotic diversity and its limits , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Erik Kristiansson,et al.  Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics , 2016, BMC Genomics.

[18]  P. Bork,et al.  Prediction of effective genome size in metagenomic samples , 2007, Genome Biology.

[19]  R. Knight,et al.  Moving pictures of the human microbiome , 2011, Genome Biology.

[20]  Jinling Huang,et al.  Horizontal gene transfer: building the web of life , 2015, Nature Reviews Genetics.

[21]  Robert G. Beiko,et al.  Identifying biologically relevant differences between metagenomic communities , 2010, Bioinform..

[22]  I. Nookaew,et al.  Insights from 20 years of bacterial genome sequencing , 2015, Functional & Integrative Genomics.

[23]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[24]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[25]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[26]  R. Knight,et al.  Diversity, stability and resilience of the human gut microbiota , 2012, Nature.

[27]  M. Blaser,et al.  The human microbiome: at the interface of health and disease , 2012, Nature Reviews Genetics.

[28]  Damian Szklarczyk,et al.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges , 2011, Nucleic Acids Res..

[29]  Luis Pedro Coelho,et al.  Structure and function of the global ocean microbiome , 2015, Science.

[30]  Intawat Nookaew,et al.  FANTOM: Functional and taxonomic analysis of metagenomes , 2013, BMC Bioinformatics.

[31]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[32]  C. Mungall,et al.  Gene Ontology Consortium : going forward The Gene Ontology , 2015 .

[33]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[34]  C. Pedrós-Alió,et al.  Marine microbial diversity: can it be determined? , 2006, Trends in microbiology.

[35]  Philip Hugenholtz,et al.  Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity , 1998, Journal of bacteriology.

[36]  Fredrik H. Karlsson,et al.  Gut metagenome in European women with normal, impaired and diabetic glucose control , 2013, Nature.

[37]  Yoshihiro Yamanishi,et al.  KEGG for linking genomes to life and the environment , 2007, Nucleic Acids Res..

[38]  Patrick K. H. Lee,et al.  Metagenomic Reconstruction of Key Anaerobic Digestion Pathways in Municipal Sludge and Industrial Wastewater Biogas-Producing Systems , 2016, Front. Microbiol..

[39]  Andreas Wilke,et al.  phylogenetic and functional analysis of metagenomes , 2022 .

[40]  Erik Kristiansson,et al.  The Human Gut Microbiome as a Transporter of Antibiotic Resistance Genes between Continents , 2015, Antimicrobial Agents and Chemotherapy.

[41]  M. David,et al.  Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw , 2011, Nature.

[42]  Lingling An,et al.  A robust approach for identifying differentially abundant features in metagenomic samples , 2015, Bioinform..

[43]  Cameron Johnson,et al.  Structure, variation, and assembly of the root-associated microbiomes of rice , 2015, Proceedings of the National Academy of Sciences.

[44]  J. Clemente,et al.  Human gut microbiome viewed across age and geography , 2012, Nature.

[45]  Erik Kristiansson,et al.  BMC Bioinformatics BioMed Central Methodology article Weighted analysis of general microarray experiments , 2007 .

[46]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[47]  J. Eisen,et al.  Metagenomic Sequencing of an In Vitro-Simulated Microbial Community , 2010, PloS one.

[48]  P. Hugenholtz,et al.  Why the ‘ meta ’ in metagenomics ? , 2022 .

[49]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[50]  W. Shu,et al.  Comparative metagenomic and metatranscriptomic analyses of microbial communities in acid mine drainage , 2014, The ISME Journal.

[51]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[52]  Peer Bork,et al.  MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit , 2012, PloS one.

[53]  Davide Heller,et al.  eggNOG 4.5: a hierarchical orthology framework with improved functional annotations for eukaryotic, prokaryotic and viral sequences , 2015, Nucleic Acids Res..

[54]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[55]  R. Knight,et al.  The human microbiome project: exploring the microbial part of ourselves in a changing world , 2022 .

[56]  J. Handelsman,et al.  Molecular biological access to the chemistry of unknown soil microbes: a new frontier for natural products. , 1998, Chemistry & biology.

[57]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[58]  E. Kristiansson,et al.  Tentacle: distributed quantification of genes in metagenomes , 2015, GigaScience.

[59]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[60]  John C. Wooley,et al.  Metagenomics: Facts and Artifacts, and Computational Challenges , 2010, Journal of Computer Science and Technology.

[61]  R. Edwards,et al.  Fast Identification and Removal of Sequence Contamination from Genomic and Metagenomic Datasets , 2011, PloS one.

[62]  Levi Waldron,et al.  Metagenomic biomarker discovery and explanation , 2011 .

[63]  Jo Handelsman,et al.  Metagenomics for studying unculturable microorganisms: cutting the Gordian knot , 2005, Genome Biology.

[64]  Erik Kristiansson,et al.  ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes , 2009, Bioinform..

[65]  Bing Li,et al.  Abundant rifampin resistance genes and significant correlations of antibiotic resistance genes and plasmids in various environments revealed by metagenomic analysis , 2014, Applied Microbiology and Biotechnology.

[66]  M. Robinson,et al.  Small-sample estimation of negative binomial dispersion, with applications to SAGE data. , 2007, Biostatistics.

[67]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[68]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[69]  I. Nookaew,et al.  A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae , 2012, Nucleic acids research.

[70]  Bing Li,et al.  Fate of antibiotic resistance genes in sewage treatment plant revealed by metagenomic approach. , 2014, Water research.