An integrated metagenomics pipeline for strain profiling reveals novel 1 patterns of transmission and global biogeography of bacteria 2 3

We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single nucleotide polymorphisms, from shotgun metagenomes. Our method leverages a database of >30,000 bacterial reference genomes which we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare single nucleotide variants to reveal extensive vertical transmission of strains at birth but colonization with strains unlikely to derive from the mother at later time points. This pattern was missed with species-level analysis, because the infant gut microbiome composition converges towards that of an adult over time. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data is analyzed at a higher taxonomic resolution.

[1]  Emily R. Davenport,et al.  Genetic Determinants of the Gut Microbiome in UK Twins. , 2016, Cell host & microbe.

[2]  Nitin Kumar,et al.  Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation , 2016, Nature.

[3]  P. Bork,et al.  Durable coexistence of donor and recipient strains after fecal microbiota transplantation , 2016, Science.

[4]  Duy Tin Truong,et al.  Strain-level microbial epidemiology and population genomics from shotgun metagenomics , 2016, Nature Methods.

[5]  T. R. Licht,et al.  A catalog of the mouse gut metagenome , 2015, Nature Biotechnology.

[6]  Katherine H. Huang,et al.  Detection of low-abundance bacterial strains in metagenomic datasets by eigengenome partitioning , 2015, Nature Biotechnology.

[7]  Ying Chen,et al.  High speed BLASTN: an accelerated MegaBLAST search tool , 2015, Nucleic acids research.

[8]  Rob Knight,et al.  ConStrains identifies microbial strains in metagenomic datasets , 2015, Nature Biotechnology.

[9]  Natalia N. Ivanova,et al.  Microbial species delineation using whole genome sequences , 2015, Nucleic acids research.

[10]  S. Rampelli,et al.  Metagenome Sequencing of the Hadza Hunter-Gatherer Gut Microbiota , 2015, Current Biology.

[11]  J. Foster,et al.  Phylogenetically typing bacterial strains from partial SNP genotypes observed from direct sequencing of clinical specimen metagenomic data , 2015, Genome Medicine.

[12]  D. Bhaya,et al.  Fine-scale diversity and extensive recombination in a quasisexual bacterial population occupying a broad niche , 2015, Science.

[13]  Luis Pedro Coelho,et al.  Structure and function of the global ocean microbiome , 2015, Science.

[14]  V. Tremaroli,et al.  Dynamics and Stabilization of the Human Gut Microbiome during the First Year of Life. , 2015, Cell host & microbe.

[15]  P. Bork,et al.  Inter-individual differences in the gene content of human gut bacterial species , 2015, Genome Biology.

[16]  K. Pollard,et al.  Average genome size estimation improves comparative metagenomics and sheds light on the functional ecology of the human microbiome , 2015, Genome Biology.

[17]  Cecil M. Lewis,et al.  Subsistence strategies in traditional societies distinguish gut microbiomes , 2015, Nature Communications.

[18]  Jenny Tung,et al.  Social networks predict gut microbiome composition in wild baboons , 2015, eLife.

[19]  I. Nookaew,et al.  Insights from 20 years of bacterial genome sequencing , 2015, Functional & Integrative Genomics.

[20]  Elhanan Borenstein,et al.  Extensive Strain-Level Copy-Number Variation across Human Gut Microbiome Species , 2015, Cell.

[21]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[22]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[23]  Amanda G. Henry,et al.  Gut microbiome of the Hadza hunter-gatherers , 2014, Nature Communications.

[24]  N. Kashtan,et al.  Single-Cell Genomics Reveals Hundreds of Coexisting Subpopulations in Wild Prochlorococcus , 2014, Science.

[25]  Qichao Tu,et al.  Strain/species identification in metagenomes using genome-specific markers , 2014, Nucleic acids research.

[26]  Roy Kishony,et al.  Genetic variation of a bacterial pathogen within individuals with cystic fibrosis provides a record of selective pressures , 2013, Nature Genetics.

[27]  H. Makino,et al.  Mother-to-Infant Transmission of Intestinal Bifidobacterial Strains Has an Impact on the Early Development of Vaginally Delivered Infant's Microbiota , 2013, PloS one.

[28]  Yan Zhang,et al.  PATRIC, the bacterial bioinformatics database and analysis resource , 2013, Nucleic Acids Res..

[29]  Alexandros Stamatakis,et al.  Metagenomic species profiling using universal phylogenetic marker genes , 2013, Nature Methods.

[30]  P. Bork,et al.  Accurate and universal delineation of prokaryotic species , 2013, Nature Methods.

[31]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[32]  Quinn Snell,et al.  Pathoscope: Species identification and strain attribution with unassembled sequencing data , 2013, Genome research.

[33]  J. Eisen,et al.  Systematic Identification of Gene Families for Use as “Markers” for Phylogenetic and Phylogeny-Driven Ecological Studies of Bacteria and Archaea and Their Major Subgroups , 2013, PloS one.

[34]  K. Pollard,et al.  Global marine bacterial diversity peaks at high latitudes in winter , 2013, The ISME Journal.

[35]  P. Meltzer,et al.  SRAdb: query and use public next-generation sequencing data from within R , 2013, BMC Bioinformatics.

[36]  Scott T. Bates,et al.  Cross-biome metagenomic analyses of soil microbial communities and their functional attributes , 2012, Proceedings of the National Academy of Sciences.

[37]  Alison S. Waller,et al.  Genomic variation landscape of the human gut microbiome , 2012, Nature.

[38]  R. Stepanauskas Single cell genomics: an individual look at microbes. , 2012, Current opinion in microbiology.

[39]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[40]  Katherine H. Huang,et al.  A framework for human microbiome research , 2012, Nature.

[41]  Otto X. Cordero,et al.  Population Genomics of Early Events in the Ecological Differentiation of Bacteria , 2012, Science.

[42]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[43]  J. M. Rodríguez,et al.  Sharing of Bacterial Strains Between Breast Milk and Infant Feces , 2012, Journal of human lactation : official journal of International Lactation Consultant Association.

[44]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[45]  James H. Bullard,et al.  Origins of the E. coli strain causing an outbreak of hemolytic-uremic syndrome in Germany. , 2011, The New England journal of medicine.

[46]  Evan S Snitkin,et al.  Genome-wide recombination drives diversification of epidemic strains of Acinetobacter baumannii , 2011, Proceedings of the National Academy of Sciences.

[47]  Hideaki Sugawara,et al.  The Sequence Read Archive , 2010, Nucleic Acids Res..

[48]  Natalia N. Ivanova,et al.  A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea , 2009, Nature.

[49]  R. Rosselló-Móra,et al.  Shifting the genomic gold standard for the prokaryotic species definition , 2009, Proceedings of the National Academy of Sciences.

[50]  R. Overbeek,et al.  FIGfams: yet another set of protein families , 2009, Nucleic acids research.

[51]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[52]  Amit R. Indap,et al.  Genes mirror geography within Europe , 2008, Nature.

[53]  Elon Portugaly,et al.  Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space , 2008, ISMB.

[54]  K. Konstantinidis,et al.  The bacterial species definition in the genomic era , 2006, Philosophical Transactions of the Royal Society B: Biological Sciences.

[55]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[56]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[57]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[58]  J. Clemente,et al.  The Long-Term Stability of the Human Gut Microbiota , 2013 .

[59]  Robert C. Edgar,et al.  Search and clustering orders of magnitude faster than BLAST , 2010 .

[60]  F. Cohan What are bacterial species? , 2002, Annual review of microbiology.

[61]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..