Identification and assembly of genomes and genetic elements in complex metagenomic samples without using reference genomes

Most current approaches for analyzing metagenomic data rely on comparisons to reference genomes, but the microbial diversity of many environments extends far beyond what is covered by reference databases. De novo segregation of complex metagenomic data into specific biological entities, such as particular bacterial strains or viruses, remains a largely unsolved problem. Here we present a method, based on binning co-abundant genes across a series of metagenomic samples, that enables comprehensive discovery of new microbial organisms, viruses and co-inherited genetic entities and aids assembly of microbial genomes without the need for reference sequences. We demonstrate the method on data from 396 human gut microbiome samples and identify 7,381 co-abundance gene groups (CAGs), including 741 metagenomic species (MGS). We use these to assemble 238 high-quality microbial genomes and identify affiliations between MGS and hundreds of viruses or genetic entities. Our method provides the means for comprehensive profiling of the diversity within complex metagenomic samples.

[1]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[2]  S. Ehrlich,et al.  Essential Bacillus subtilis genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[4]  R. Amann,et al.  Application of tetranucleotide frequencies for the assignment of genomic fragments. , 2004, Environmental microbiology.

[5]  S. Tringe,et al.  Comparative Metagenomics of Microbial Communities , 2004, Science.

[6]  B. Snel,et al.  Toward Automatic Reconstruction of a Highly Resolved Tree of Life , 2006, Science.

[7]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[8]  C. Roessner,et al.  Fine-Tuning Our Knowledge of the Anaerobic Route to Cobalamin (Vitamin B12) , 2006, Journal of bacteriology.

[9]  Nikos Kyrpides,et al.  CRISPR Recognition Tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats , 2007, BMC Bioinformatics.

[10]  Taraneh Hemami Most wanted , 2007, SIGGRAPH '07.

[11]  A. Salamov,et al.  Use of simulated data sets to evaluate the fidelity of metagenomic processing methods , 2007, Nature Methods.

[12]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[13]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[14]  P. Bork,et al.  Molecular eco-systems biology: towards an understanding of community function , 2008, Nature Reviews Microbiology.

[15]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[16]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[17]  B. Birren,et al.  Genome Project Standards in a New Era of Sequencing , 2009, Science.

[18]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[19]  Siu-Ming Yiu,et al.  SOAP2: an improved ultrafast tool for short read alignment , 2009, Bioinform..

[20]  M. Borodovsky,et al.  Ab initio gene identification in metagenomic sequences , 2010, Nucleic acids research.

[21]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[22]  John C. Wooley,et al.  A Primer on Metagenomics , 2010, PLoS Comput. Biol..

[23]  D. Ussery,et al.  Comparison of 61 Sequenced Escherichia coli Genomes , 2010, Microbial Ecology.

[24]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[25]  Gipsi Lima-Mendez,et al.  ACLAME: A CLAssification of Mobile genetic Elements, update 2010 , 2009, Nucleic Acids Res..

[26]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[27]  P. Bork,et al.  A human gut microbial gene catalogue established by metagenomic sequencing , 2010, Nature.

[28]  Peer Bork,et al.  Interactive Tree Of Life v2: online annotation and display of phylogenetic trees made easy , 2011, Nucleic Acids Res..

[29]  C. Grimaldi,et al.  Genome Sequence of the Probiotic Strain Bifidobacterium animalis subsp. lactis CNCM I-2494 , 2011, Journal of bacteriology.

[30]  Sergey Koren,et al.  Bambus 2: scaffolding metagenomes , 2011, Bioinform..

[31]  Nuno A. Fonseca,et al.  Assemblathon 1: a competitive assessment of de novo short read assembly methods. , 2011, Genome research.

[32]  Florent E. Angly,et al.  Next Generation Sequence Assembly with AMOS , 2011, Current protocols in bioinformatics.

[33]  Robert D. Finn,et al.  HMMER web server: interactive sequence similarity searching , 2011, Nucleic Acids Res..

[34]  R. Terns,et al.  CRISPR-based adaptive immune systems. , 2011, Current opinion in microbiology.

[35]  F. Bushman,et al.  The human gut virome: inter-individual variation and dynamic response to diet. , 2011, Genome research.

[36]  A. Mushegian,et al.  Evolutionarily Conserved Orthologous Families in Phages Are Relatively Rare in Their Prokaryotic Hosts , 2011, Journal of bacteriology.

[37]  Miguel Vicente,et al.  The enemy within us: lessons from the 2011 European Escherichia coli O104:H4 outbreak , 2012, EMBO molecular medicine.

[38]  I. Tirosh,et al.  CRISPR targeting reveals a reservoir of common phages associated with the human gut microbiome , 2012, Genome research.

[39]  B. Birren,et al.  The “Most Wanted” Taxa from the Human Microbiome for Whole Genome Sequencing , 2012, PloS one.

[40]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[41]  R. Morris,et al.  Untangling Genomes from Metagenomes: Revealing an Uncultured Class of Marine Euryarchaeota , 2012, Science.

[42]  Peer Bork,et al.  MOCAT: A Metagenomics Assembly and Gene Prediction Toolkit , 2012, PloS one.

[43]  S. Rasmussen,et al.  Identification of acquired antimicrobial resistance genes , 2012, The Journal of antimicrobial chemotherapy.

[44]  Qiang Feng,et al.  A metagenome-wide association study of gut microbiota in type 2 diabetes , 2012, Nature.

[45]  Damian Szklarczyk,et al.  eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges , 2011, Nucleic Acids Res..

[46]  Siu-Ming Yiu,et al.  MetaCluster 5.0: a two-round binning approach for metagenomic data for low-abundance species in a noisy sample , 2012, Bioinform..

[47]  Haixu Tang,et al.  CRISPR-Cas systems target a diverse collection of invasive mobile genetic elements in human microbiomes , 2013, Genome Biology.

[48]  Cliff Han,et al.  Nearly finished genomes produced using gel microdroplet culturing reveal substantial intraspecies genomic diversity within the human microbiome , 2013, Genome research.

[49]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[50]  P. Bork,et al.  Richness of human gut microbiome correlates with metabolic markers , 2013, Nature.