A new genomic blueprint of the human gut microbiota

The composition of the human gut microbiota is linked to health and disease, but knowledge of individual microbial species is needed to decipher their biological roles. Despite extensive culturing and sequencing efforts, the complete bacterial repertoire of the human gut microbiota remains undefined. Here we identify 1,952 uncultured candidate bacterial species by reconstructing 92,143 metagenome-assembled genomes from 11,850 human gut microbiomes. These uncultured genomes substantially expand the known species repertoire of the collective human gut microbiota, with a 281% increase in phylogenetic diversity. Although the newly identified species are less prevalent in well-studied populations compared to reference isolate genomes, they improve classification of understudied African and South American samples by more than 200%. These candidate species encode hundreds of newly identified biosynthetic gene clusters and possess a distinctive functional capacity that might explain their elusive nature. Our work expands the known diversity of uncultured gut bacteria, which provides unprecedented resolution for taxonomic and functional characterization of the intestinal microbiota.The known species repertoire of the collective human gut microbiota is substantially expanded with the discovery of 1,952 uncultured bacterial species that greatly improve classification of understudied African and South American samples.

[1]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[2]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[3]  J. DiRuggiero,et al.  MetaWRAP—a flexible pipeline for genome-resolved metagenomic data analysis , 2018, Microbiome.

[4]  C. Robert,et al.  Culture of previously uncultured members of the human gut microbiota by culturomics , 2016, Nature Microbiology.

[5]  M. Dunn,et al.  A human gut bacterial genome and culture collection for improved metagenomic analyses , 2019, Nature Biotechnology.

[6]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[7]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[8]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[9]  Erin Beck,et al.  TIGRFAMs and Genome Properties in 2013 , 2012, Nucleic Acids Res..

[10]  R. Crichton Inorganic Biochemistry of Iron Metabolism: From Molecular Mechanisms to Clinical Consequences , 2001 .

[11]  Natalia N. Ivanova,et al.  Microbial species delineation using whole genome sequences , 2015, Nucleic acids research.

[12]  Jens Roat Kultima,et al.  Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes , 2014, Nature Biotechnology.

[13]  N. Lewis,et al.  Dietary serine-microbiota interaction enhances chemotherapeutic toxicity without altering drug conversion , 2020, Nature Communications.

[14]  Peter Williams,et al.  IMG: the integrated microbial genomes database and comparative analysis system , 2011, Nucleic Acids Res..

[15]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[16]  C. Huttenhower,et al.  PhyloPhlAn is a new method for improved phylogenetic and taxonomic placement of microbes , 2013, Nature Communications.

[17]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[18]  Jean M. Macklaim,et al.  ANOVA-Like Differential Expression (ALDEx) Analysis for Mixed Population RNA-Seq , 2013, PloS one.

[19]  Brian D. Ondov,et al.  Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.

[20]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[21]  James R. Cole,et al.  Reconstructing 16S rRNA genes in metagenomic data , 2015, Bioinform..

[22]  Nitin Kumar,et al.  Culturing of female bladder bacteria reveals an interconnected urogenital microbiota , 2018, Nature Communications.

[23]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[24]  Tom O. Delmont,et al.  Nitrogen-fixing populations of Planctomycetes and Proteobacteria are abundant in surface ocean metagenomes , 2018, Nature Microbiology.

[25]  Neil D. Rawlings,et al.  Genome properties in 2019: a new companion database to InterPro for the inference of complete functional attributes , 2018, Nucleic Acids Res..

[26]  Duy Tin Truong,et al.  Mother-to-Infant Microbial Transmission from Different Body Sites Shapes the Developing Infant Gut Microbiome , 2018, Cell host & microbe.

[27]  Donovan H. Parks,et al.  Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life , 2017, Nature Microbiology.

[28]  Jizhong Zhou,et al.  A Proposed Genus Boundary for the Prokaryotes Based on Genomic Insights , 2014, Journal of bacteriology.

[29]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[30]  Kai Blin,et al.  antiSMASH 4.0—improvements in chemistry prediction and gene cluster boundary identification , 2017, Nucleic Acids Res..

[31]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[32]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[33]  Luiz Irber,et al.  sourmash: a library for MinHash sketching of DNA , 2016, J. Open Source Softw..

[34]  Brian C. Thomas,et al.  Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system , 2016, Nature Communications.

[35]  A. Phillippy,et al.  High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries , 2017, Nature Communications.

[36]  Robert D. Finn,et al.  EBI Metagenomics in 2017: enriching the analysis of microbial communities, from sequence reads to assemblies , 2017, Nucleic Acids Res..

[37]  The Gene Ontology Consortium,et al.  Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[38]  J. Eisen,et al.  A simple, fast, and accurate method of phylogenomic inference , 2008, Genome Biology.

[39]  W. D. de Vos,et al.  The first 1000 cultured species of the human gastrointestinal microbiota , 2014, FEMS microbiology reviews.

[40]  Blake A. Simmons,et al.  MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets , 2016, Bioinform..

[41]  B. Haas,et al.  A Catalog of Reference Genomes from the Human Microbiome , 2010, Science.

[42]  Konstantinos T. Konstantinidis,et al.  Towards a Genome-Based Taxonomy for Prokaryotes , 2005, Journal of bacteriology.

[43]  M. Surette,et al.  Capturing the diversity of the human gut microbiota through culture-enriched molecular profiling , 2016, Genome Medicine.

[44]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[45]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[46]  H. Ogata,et al.  In silico Prediction of Virus-Host Interactions for Marine Bacteroidetes With the Use of Metagenome-Assembled Genomes , 2020, Frontiers in Microbiology.

[47]  Peer Bork,et al.  Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees , 2016, Nucleic Acids Res..

[48]  D. Raoult,et al.  A comprehensive repertoire of prokaryotic species identified in human beings. , 2015, The Lancet. Infectious diseases.

[49]  Jean M. Macklaim,et al.  A reproducible effect size is more useful than an irreproducible hypothesis test to analyze high throughput sequencing datasets , 2018, 1809.02623.

[50]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[51]  J. Banfield,et al.  dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication , 2017, The ISME Journal.

[52]  Liam J. Revell,et al.  phytools: an R package for phylogenetic comparative biology (and other things) , 2012 .

[53]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[54]  The Gene Ontology Consortium Expansion of the Gene Ontology knowledgebase and resources , 2016, Nucleic Acids Res..

[55]  Johannes Alneberg,et al.  Genomes from uncultivated prokaryotes: a comparison of metagenome-assembled and single-amplified genomes , 2017, Microbiome.

[56]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[57]  R. Dewhurst,et al.  Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen , 2018, Nature Communications.

[58]  Yang Young Lu,et al.  VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data , 2017, Microbiome.

[59]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[60]  N. Segata,et al.  Shotgun metagenomics, from sampling to analysis , 2017, Nature Biotechnology.

[61]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[62]  Large-scale genome-wide analysis links lactic acid bacteria from food with the gut microbiome , 2020, Nature Communications.

[63]  Edoardo Pasolli,et al.  Analysis of 1321 Eubacterium rectale genomes from metagenomes uncovers complex phylogeographic population structure and subspecies functional adaptations , 2020, Genome Biology.

[64]  R. Nandakumar,et al.  Gut microbial diversity, inflammation, and oxidative stress are associated with tacrolimus dosing requirements early after heart transplantation , 2020, PloS one.

[65]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[66]  Nitin Kumar,et al.  Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation , 2016, Nature.

[67]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[68]  Rafael A. Irizarry,et al.  Meta-analysis of gut microbiome studies identifies disease-specific and shared responses , 2017, Nature Communications.

[69]  E. Mardis,et al.  An obesity-associated gut microbiome with increased capacity for energy harvest , 2006, Nature.

[70]  Dongwan D. Kang,et al.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities , 2015, PeerJ.

[71]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[72]  P. Bork,et al.  Accurate and universal delineation of prokaryotic species , 2013, Nature Methods.

[73]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[74]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[75]  M. Kanehisa,et al.  BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. , 2016, Journal of molecular biology.

[76]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.