COBRA improves the quality of viral genomes assembled from metagenomes

Microbial and viral diversity, distribution, and ecological impacts are often studied using metagenome-assembled sequences, but genome incompleteness hampers comprehensive and accurate analyses. Here we introduce COBRA (Contig Overlap Based Re-Assembly), a tool that resolves de Bruijn graph based assembly breakpoints and joins contigs. While applicable to any short-read assembled DNA sequences, we benchmarked COBRA by using a dataset of published complete viral genomes from the ocean. COBRA accurately joined contigs assembled by metaSPAdes, IDBA_UD, and MEGAHIT, outcompeting several existing binning tools and achieving significantly higher genome accuracy (96.6% vs 19.8-59.6%). We applied COBRA to viral contigs that we assembled from 231 published freshwater metagenomes and obtained 7,334 high-quality or complete species-level genomes (clusters with 95% average nucleotide identity) for viruses of bacteria (phages), ∼83% of which represent new phage species. Notably, ∼70% of the 7,334 species genomes were circular, compared to 34% before COBRA analyses. We expanded genomic sampling of ≥ 200 kbp phages (i.e., huge phages), the largest of which was curated to completion (717 kbp in length). The improved phage genomes from Rotsee Lake provided context for metatranscriptomic data and indicated in situ activity of huge phages, WhiB and cysC/cysH encoding phages from this site. In conclusion, COBRA improves the assembly contiguity and completeness of microbial and viral genomes and thus, the accuracy and reliability of analyses of gene content, diversity, and evolution.

[1]  M. O'Beirne,et al.  Viruses of sulfur oxidizing phototrophs encode genes for pigment, carbon, and sulfur metabolisms , 2023, Communications earth & environment.

[2]  Darren L. Smith,et al.  The long and short of it: Benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies , 2023, bioRxiv.

[3]  Guoping Wang,et al.  Uncovering 1,058 novel human enteric DNA viruses through deep long-read third-generation sequencing and their clinical impact. , 2022, Gastroenterology.

[4]  Adair L. Borges,et al.  Widespread stop-codon recoding in bacteriophages may regulate translation of lytic genes , 2022, Nature Microbiology.

[5]  F. Aylward,et al.  Infection strategy and biogeography distinguish cosmopolitan groups of marine jumbo bacteriophages , 2022, The ISME Journal.

[6]  Alyssa M. Adams,et al.  vRhyme enables binning of viral genomes from metagenomes , 2021, bioRxiv.

[7]  Adair L. Borges,et al.  Phage-encoded ribosomal protein S21 expression is linked to late-stage phage replication , 2021, ISME Communications.

[8]  Adair L. Borges,et al.  Closely related Lak megaphages replicate in the microbiomes of diverse animals , 2021, iScience.

[9]  Shiraz A. Shah,et al.  Genome binning of viral entities from bulk metagenomics data , 2021, bioRxiv.

[10]  G. Alzbutas,et al.  Exploring Viral Diversity in a Gypsum Karst Lake Ecosystem Using Targeted Single-Cell Genomics , 2021, Genes.

[11]  Leyden Fernández,et al.  Comprehensive dataset of shotgun metagenomes from oxygen stratified freshwater lakes and ponds , 2021, Scientific Data.

[12]  Guylaine Poisson,et al.  CoCoNet: an efficient deep learning tool for viral metagenome binning , 2021, Bioinform..

[13]  Alexander J. Probst,et al.  Virus-associated organosulfur metabolism in human and environmental systems , 2021, bioRxiv.

[14]  L. Aravind,et al.  Jumbo Phages: A Comparative Genomic Overview of Core Functions and Adaptions for Biological Conflicts , 2021, Viruses.

[15]  N. Kyrpides,et al.  CheckV assesses the quality and completeness of metagenome-assembled viral genomes , 2020, Nature Biotechnology.

[16]  R. Finn,et al.  Massive expansion of human gut bacteriophage diversity , 2020, Cell.

[17]  S. Hallam,et al.  Ecology of inorganic sulfur auxiliary metabolism in widespread bacteriophages , 2020, Nature Communications.

[18]  A. von Haeseler,et al.  Corrigendum to: IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2020, Molecular biology and evolution.

[19]  M. Moniruzzaman,et al.  Dynamic genome evolution and complex virocell metabolism of globally-distributed giant viruses , 2020, Nature Communications.

[20]  Karthik Anantharaman,et al.  VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences , 2020, Microbiome.

[21]  E. Delong,et al.  Assembly-free single-molecule sequencing recovers complete virus genomes from natural microbial communities , 2020, Genome research.

[22]  J. Banfield,et al.  Large freshwater phages with the potential to augment aerobic methane oxidation , 2020, Nature Microbiology.

[23]  S. Moineau,et al.  Phage diversity, genomics and phylogeny , 2020, Nature Reviews Microbiology.

[24]  Vincent J. Denef,et al.  Giant virus diversity and host interactions through global metagenomics , 2020, Nature.

[25]  B. La Scola,et al.  Advantages and Limits of Metagenomic Assembly and Binning of a Giant Virus , 2020, mSystems.

[26]  Kihyun Lee,et al.  Freshwater viral metagenome reveals novel and functional phage-borne antibiotic resistance genes , 2019, Microbiome.

[27]  Olga Chernomor,et al.  IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era , 2019, bioRxiv.

[28]  Maureen L. Coleman,et al.  Metabolic and biogeochemical consequences of viral infection in aquatic ecosystems , 2019, Nature Reviews Microbiology.

[29]  J. Banfield,et al.  Accurate and complete genomes from metagenomes , 2019, bioRxiv.

[30]  M. Salcher,et al.  Phage-centric ecological interactions in aquatic ecosystems revealed through ultra-deep metagenomics , 2019, Microbiome.

[31]  Christine L. Sun,et al.  Clades of huge phages from across Earth’s ecosystems , 2019, bioRxiv.

[32]  Feng Li,et al.  MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies , 2019, PeerJ.

[33]  Natalia N. Ivanova,et al.  Diversity, evolution, and classification of virophages uncovered through global metagenomics , 2018, Microbiome.

[34]  A. Phillippy,et al.  High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries , 2018, Nature Communications.

[35]  I-Min A. Chen,et al.  IMG/VR v.2.0: an integrated data management and analysis system for cultivated and environmental viral genomes , 2018, Nucleic Acids Res..

[36]  Alexander J. Probst,et al.  Biosynthetic capacity, metabolic variety and unusual biology in the CPR and DPANN radiations , 2018, Nature Reviews Microbiology.

[37]  Erik Bongcam-Rudloff,et al.  Simulating Illumina metagenomic data with InSilicoSeq , 2018, Bioinform..

[38]  Brian C. Thomas,et al.  Megaphages infect Prevotella and variants are widespread in gut microbiomes , 2018, bioRxiv.

[39]  Robert D. Finn,et al.  HMMER web server: 2018 update , 2018, Nucleic Acids Res..

[40]  Rob Egan,et al.  Ecogenomics of virophages and their giant virus hosts assessed through time series metagenomics , 2017, Nature Communications.

[41]  P. Forterre,et al.  Numerous cultivated and uncultivated viruses encode ribosomal proteins , 2017, bioRxiv.

[42]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[43]  Yihui Yuan,et al.  Jumbo Bacteriophages: An Overview , 2017, Front. Microbiol..

[44]  Eoin L. Brodie,et al.  Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system , 2016, Nature Communications.

[45]  Peer Bork,et al.  Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses , 2016, Nature.

[46]  Georgios A. Pavlopoulos,et al.  Uncovering Earth’s virome , 2016, Nature.

[47]  F. Rodríguez-Valera,et al.  Metagenomic recovery of phage genomes of uncultured freshwater actinobacteria , 2016, The ISME Journal.

[48]  Peter C. Fineran,et al.  A century of the phage: past, present and future , 2015, Nature Reviews Microbiology.

[49]  Brian C. Thomas,et al.  Unusual biology across a group comprising more than 15% of domain Bacteria , 2015, Nature.

[50]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[51]  Anders F. Andersson,et al.  Binning metagenomic contigs by coverage and composition , 2014, Nature Methods.

[52]  Karthik Anantharaman,et al.  Sulfur Oxidation Genes in Diverse Deep-Sea Viruses , 2014, Science.

[53]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[54]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[55]  Shane S. Sturrock,et al.  Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data , 2012, Bioinform..

[56]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[57]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[58]  Miriam L. Land,et al.  Trace: Tennessee Research and Creative Exchange Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification Recommended Citation Prodigal: Prokaryotic Gene Recognition and Translation Initiation Site Identification , 2022 .

[59]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[60]  Sallie W. Chisholm,et al.  Photosynthesis genes in marine viruses yield proteins during host infection , 2005, Nature.

[61]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[62]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[63]  Adam Ameur,et al.  Single-Molecule Sequencing: Towards Clinical Applications. , 2019, Trends in biotechnology.

[64]  R. Hendrix Jumbo bacteriophages. , 2009, Current topics in microbiology and immunology.

[65]  D. Holdstock Past, present--and future? , 2005, Medicine, conflict, and survival.

[66]  M. Clokie,et al.  Marine ecosystems: bacterial photosynthesis genes in a virus. , 2003, Nature.