A unified catalog of 204,938 reference genomes from the human gut microbiome

Comprehensive, high-quality reference genomes are required for functional characterization and taxonomic assignment of the human gut microbiota. We present the Unified Human Gastrointestinal Genome (UHGG) collection, comprising 204,938 nonredundant genomes from 4,644 gut prokaryotes. These genomes encode >170 million protein sequences, which we collated in the Unified Human Gastrointestinal Protein (UHGP) catalog. The UHGP more than doubles the number of gut proteins in comparison to those present in the Integrated Gene Catalog. More than 70% of the UHGG species lack cultured representatives, and 40% of the UHGP lack functional annotations. Intraspecies genomic variation analyses revealed a large reservoir of accessory genes and single-nucleotide variants, many of which are specific to individual human populations. The UHGG and UHGP collections will enable studies linking genotypes to phenotypes in the human gut microbiome. More than 200,000 gut prokaryotic reference genomes and the proteins they encode are collated, providing comprehensive resources for microbiome researchers.

[1]  Nitin Kumar,et al.  Culturing of ‘unculturable’ human microbiota reveals novel taxa and extensive sporulation , 2016, Nature.

[2]  Alison S. Waller,et al.  Genomic variation landscape of the human gut microbiome , 2012, Nature.

[3]  J. Havlík,et al.  Reclassification of Eubacterium rectale (Hauduroy et al. 1937) Prévot 1938 in a new genus Agathobacter gen. nov. as Agathobacter rectalis comb. nov., and description of Agathobacter ruminis sp. nov., isolated from the rumen contents of sheep and cows. , 2016, International journal of systematic and evolutionary microbiology.

[4]  Rida Assaf,et al.  Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center , 2016, Nucleic Acids Res..

[5]  Donovan H. Parks,et al.  A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life , 2018, Nature Biotechnology.

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Brian C. Thomas,et al.  Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system , 2016, Nature Communications.

[8]  Deanna M. Church,et al.  Assembly: a resource for assembled genomes at NCBI , 2015, Nucleic Acids Res..

[9]  A. Phillippy,et al.  High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries , 2017, Nature Communications.

[10]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[11]  Johannes Söding,et al.  Clustering huge protein sequence sets in linear time , 2018 .

[12]  David Torrents,et al.  Metformin alters the gut microbiome of individuals with treatment-naive type 2 diabetes, contributing to the therapeutic effects of the drug , 2017, Nature Medicine.

[13]  Donovan H Parks,et al.  GTDB-Tk: a toolkit to classify genomes with the Genome Taxonomy Database , 2019, Bioinform..

[14]  Jens Roat Kultima,et al.  An integrated catalog of reference genes in the human gut microbiome , 2014, Nature Biotechnology.

[15]  Natalia N. Ivanova,et al.  Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea , 2017, Nature Biotechnology.

[16]  Alexander J Probst,et al.  Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy , 2017, Nature Microbiology.

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  Robert D. Finn,et al.  MGnify: the microbiome analysis resource in 2020 , 2019, Nucleic Acids Res..

[19]  Gil McVean,et al.  Integrating long-range connectivity information into de Bruijn graphs , 2017, bioRxiv.

[20]  Mick Watson,et al.  Compendium of 4,941 rumen metagenome-assembled genomes for rumen microbiome biology and enzyme discovery , 2019, Nature Biotechnology.

[21]  J. Banfield,et al.  Accurate and complete genomes from metagenomes. , 2020, Genome research.

[22]  Courtney R. Armour,et al.  A Metagenomic Meta-analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome. , 2019, mSystems.

[23]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[24]  Katherine S. Pollard,et al.  New insights from uncultivated genomes of the global human gut microbiome , 2019, Nature.

[25]  Qiang Feng,et al.  Gut microbiome development along the colorectal adenoma–carcinoma sequence , 2015, Nature Communications.

[26]  Yuan Xu,et al.  Single-cell metagenomics: challenges and applications , 2018, Protein & Cell.

[27]  N. Segata,et al.  Multiple levels of the unknown in microbiome research , 2019, BMC Biology.

[28]  Marco Torchiano,et al.  Effsize - a package for efficient effect size computation , 2016 .

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  P. Bork,et al.  Interactive Tree Of Life (iTOL) v4: recent updates and new developments , 2019, Nucleic Acids Res..

[31]  Brian C. Thomas,et al.  The human gut and groundwater harbor non-photosynthetic bacteria belonging to a new candidate phylum sibling to Cyanobacteria , 2013, eLife.

[32]  Edoardo Pasolli,et al.  Extensive Unexplored Human Microbiome Diversity Revealed by Over 150,000 Genomes from Metagenomes Spanning Age, Geography, and Lifestyle , 2019, Cell.

[33]  M. Dunn,et al.  A human gut bacterial genome and culture collection for improved metagenomic analyses , 2019, Nature Biotechnology.

[34]  Guy Cochrane,et al.  The European Nucleotide Archive in 2019 , 2019, Nucleic Acids Res..

[35]  Pedro M. Coutinho,et al.  The carbohydrate-active enzymes database (CAZy) in 2013 , 2013, Nucleic Acids Res..

[36]  Robert D. Finn,et al.  A new genomic blueprint of the human gut microbiota , 2019, Nature.

[37]  Donovan H. Parks,et al.  Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life , 2017, Nature Microbiology.

[38]  I-Min A. Chen,et al.  IMG/M v.5.0: an integrated data management and comparative analysis system for microbial genomes and microbiomes , 2018, Nucleic Acids Res..

[39]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[40]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[41]  Blake A. Simmons,et al.  MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets , 2016, Bioinform..

[42]  I-Min A. Chen,et al.  Genomes OnLine database (GOLD) v.7: updates and new features , 2018, Nucleic Acids Res..

[43]  Torsten Seemann,et al.  Prokka: rapid prokaryotic genome annotation , 2014, Bioinform..

[44]  Qiang Feng,et al.  1,520 reference genomes from cultivated human gut bacteria enable functional microbiome analyses , 2019, Nature Biotechnology.

[45]  Daniel Müllner,et al.  fastcluster: Fast Hierarchical, Agglomerative Clustering Routines for R and Python , 2013 .

[46]  Zaid Abdo,et al.  Enrichment allows identification of diverse, rare elements in metagenomic resistome-virulome sequencing , 2017, Microbiome.

[47]  Courtney R. Armour,et al.  A Metagenomic Meta-analysis Reveals Functional Signatures of Health and Disease in the Human Gut Microbiome , 2019, mSystems.

[48]  Derrick E. Wood,et al.  Improved metagenomic analysis with Kraken 2 , 2019, Genome Biology.

[49]  Vera Pawlowsky-Glahn,et al.  It's all relative: analyzing microbiome data as compositions. , 2016, Annals of epidemiology.

[50]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[51]  Michael Y. Galperin,et al.  Expanded microbial genome coverage and improved protein family annotation in the COG database , 2014, Nucleic Acids Res..

[52]  Katherine S. Pollard,et al.  MetaQuery: a web server for rapid annotation and quantitative analysis of specific genes in the human gut microbiome , 2015, Bioinform..

[53]  J. Banfield,et al.  dRep: a tool for fast and accurate genomic comparisons that enables improved genome recovery from metagenomes through de-replication , 2017, The ISME Journal.

[54]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[55]  Brian D. Ondov,et al.  Mash: fast genome and metagenome distance estimation using MinHash , 2015, Genome Biology.

[56]  Katherine H. Huang,et al.  Structure, Function and Diversity of the Healthy Human Microbiome , 2012, Nature.

[57]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[58]  A. von Haeseler,et al.  IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies , 2014, Molecular biology and evolution.

[59]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[60]  Huijue Jia,et al.  Gut microbiome and serum metabolome alterations in obesity and after weight-loss intervention , 2017, Nature Medicine.

[61]  Andrew J. Page,et al.  Roary: rapid large-scale prokaryote pan genome analysis , 2015, bioRxiv.

[62]  B. Haas,et al.  A Catalog of Reference Genomes from the Human Microbiome , 2010, Science.

[63]  Christoph A. Merten,et al.  Antibiotics-induced monodominance of a novel gut bacterial order , 2019, Gut.

[64]  Luis Pedro Coelho,et al.  Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper , 2016, bioRxiv.

[65]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[66]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[67]  Phelim Bradley,et al.  Ultra-fast search of all deposited bacterial and viral genomic data , 2019, Nature Biotechnology.

[68]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[69]  C. Robert,et al.  Culture of previously uncultured members of the human gut microbiota by culturomics , 2016, Nature Microbiology.

[70]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[71]  Steven Salzberg,et al.  Bracken: Estimating species abundance in metagenomics data , 2016, bioRxiv.

[72]  Robert D. Finn,et al.  Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families , 2017, Nucleic Acids Res..

[73]  Sean R. Eddy,et al.  Infernal 1.0: inference of RNA alignments , 2009, Bioinform..

[74]  Connor T. Skennerton,et al.  CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes , 2015, Genome research.

[75]  Davide Heller,et al.  eggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses , 2018, Nucleic Acids Res..

[76]  Feng Li,et al.  MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies , 2019, PeerJ.

[77]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[78]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[79]  Brandi L. Cantarel,et al.  The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics , 2008, Nucleic Acids Res..

[80]  Stefan Van Aelst,et al.  Fast and robust bootstrap for multivariate inference: The R package FRB , 2013 .

[81]  Sean M. Kearney,et al.  A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research , 2019, Nature Medicine.