Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome

Average genome size is an important, yet often overlooked, property of microbial communities. We developed MicrobeCensus to rapidly and accurately estimate average genome size from shotgun metagenomic data and applied our tool to 1,352 human microbiome samples. We found that average genome size differs significantly within and between body sites and tracks with major functional and taxonomic differences. In the gut, average genome size is positively correlated with the abundance of Bacteroides and genes related to carbohydrate metabolism. Importantly, we found that average genome size variation can bias comparative analyses, and that normalization improves detection of differentially abundant genes.

[1]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[2]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[3]  P. Bork,et al.  Richness of human gut microbiome correlates with metabolic markers , 2013, Nature.

[4]  N. Moran,et al.  Deletional bias and the evolution of bacterial genomes. , 2001, Trends in genetics : TIG.

[5]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[6]  M. Pop,et al.  Accurate and fast estimation of taxonomic profiles from metagenomic shotgun sequences , 2011, BMC Genomics.

[7]  Edward C. Uberbacher,et al.  Gene and translation initiation site prediction in metagenomic sequences , 2012, Bioinform..

[8]  R. Knight,et al.  Diversity, stability and resilience of the human gut microbiota , 2012, Nature.

[9]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[10]  Stephan Frickenhaus,et al.  Average genome size: a potential source of bias in comparative metagenomics , 2010, The ISME Journal.

[11]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[12]  Emese Meglécz,et al.  Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing , 2011, BMC Genomics.

[13]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[14]  Susumu Goto,et al.  Data, information, knowledge and principle: back to metabolism in KEGG , 2013, Nucleic Acids Res..

[15]  Elizabeth A. Grice,et al.  The skin microbiome , 2020, Nature.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  S. Giovannoni,et al.  Implications of streamlining theory for microbial ecology , 2014, The ISME Journal.

[18]  R. Knight,et al.  Microbial Eukaryotes in the Human Microbiome: Ecology, Evolution, and Future Directions , 2011, Front. Microbio..

[19]  E. Delong,et al.  Comparative Metagenomic Analysis of a Microbial Community Residing at a Depth of 4,000 Meters at Station ALOHA in the North Pacific Subtropical Gyre , 2009, Applied and Environmental Microbiology.

[20]  R. Lenski,et al.  The Black Queen Hypothesis: Evolution of Dependencies through Adaptive Gene Loss , 2012, mBio.

[21]  Sharon I. Greenblum,et al.  Metagenomic systems biology of the human gut microbiome reveals topological shifts associated with obesity and inflammatory bowel disease , 2011, Proceedings of the National Academy of Sciences.

[22]  Ian T. Paulsen,et al.  Comparative Analyses of Fundamental Differences in Membrane Transport Capabilities in Prokaryotes and Eukaryotes , 2005, PLoS Comput. Biol..

[23]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[24]  Juliane C. Dohm,et al.  Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems , 2011, Genome Biology.

[25]  Sean R. Davis,et al.  SRAdb: query and use public next-generation sequencing data from within R , 2013, BMC Bioinformatics.

[26]  Allyson L. Byrd,et al.  Biogeography and individuality shape function in the human skin metagenome , 2014, Nature.

[27]  J. Eisen,et al.  Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups , 2013, PloS one.

[28]  J. Walter,et al.  The human gut microbiome: ecology and recent evolutionary changes. , 2011, Annual review of microbiology.

[29]  Karsten Kristiansen,et al.  Choice of bacterial DNA extraction method from fecal material influences community structure as evaluated by metagenomic analysis , 2014, Microbiome.

[30]  John C. Nash,et al.  Unifying Optimization Algorithms to Aid Software System Users: optimx for R , 2011 .

[31]  Fredrik H. Karlsson,et al.  Gut metagenome in European women with normal, impaired and diabetic glucose control , 2013, Nature.

[32]  R. Edwards,et al.  A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes , 2014, Nature Communications.

[33]  Damian Szklarczyk,et al.  eggNOG v4.0: nested orthology inference across 3686 organisms , 2013, Nucleic Acids Res..

[34]  Bernard Henrissat,et al.  Metabolic Reconstruction for Metagenomic Data and Its Application to the Human Microbiome , 2012, PLoS Comput. Biol..

[35]  Damian Szklarczyk,et al.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges , 2011, Nucleic Acids Res..

[36]  Forest Rohwer,et al.  The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes , 2009, PLoS Comput. Biol..

[37]  Yongan Zhao,et al.  RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data , 2011, Bioinform..

[38]  Marcel Martin Cutadapt removes adapter sequences from high-throughput sequencing reads , 2011 .

[39]  I-Min A. Chen,et al.  IMG ER: a system for microbial genome annotation expert review and curation , 2009, Bioinform..

[40]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[41]  C. Huttenhower,et al.  Metagenomic microbial community profiling using unique clade-specific marker genes , 2012, Nature Methods.

[42]  Bernard Henrissat,et al.  Characterizing a model human gut microbiota composed of members of its two dominant bacterial phyla , 2009, Proceedings of the National Academy of Sciences.

[43]  Howard Ochman,et al.  The consequences of genetic drift for bacterial genome complexity. , 2009, Genome research.

[44]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[45]  R. Ley,et al.  Ecological and Evolutionary Forces Shaping Microbial Diversity in the Human Intestine , 2006, Cell.

[46]  H. Swerdlow,et al.  A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers , 2012, BMC Genomics.

[47]  P. Bork,et al.  Prediction of effective genome size in metagenomic samples , 2007, Genome Biology.

[48]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[49]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[50]  R. Overbeek,et al.  FIGfams: yet another set of protein families , 2009, Nucleic acids research.

[51]  Benjamin J. Raphael,et al.  The Sorcerer II Global Ocean Sampling Expedition: Expanding the Universe of Protein Families , 2007, PLoS biology.

[52]  S. Sørensen,et al.  Quantitative Metagenomic Analyses Based on Average Genome Size Normalization , 2011, Applied and Environmental Microbiology.

[53]  Forest Rohwer,et al.  Going viral: next-generation sequencing applied to phage populations in the human gut , 2012, Nature Reviews Microbiology.

[54]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[55]  Tatiana A. Tatusova,et al.  RefSeq microbial genomes database: new representation and annotation strategy , 2013, Nucleic Acids Res..

[56]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[57]  Florent E. Angly,et al.  Grinder: a versatile amplicon and shotgun sequence simulator , 2012, Nucleic acids research.