Phables: from fragmented assemblies to high-quality bacteriophage genomes

Motivation Microbial communities influence both human health and different environments. Viruses infecting bacteria, known as bacteriophages or phages, play a key role in modulating bacterial communities within environments. High-quality phage genome sequences are essential for advancing our understanding of phage biology, enabling comparative genomics studies, and developing phage-based diagnostic tools. Most available viral identification tools consider individual sequences to determine whether they are of viral origin. As a result of the challenges in viral assembly, fragmentation of genomes can occur, leading to the need for new approaches in viral identification. Therefore, the identification and characterisation of novel phages remain a challenge. Results We introduce Phables, a new computational method to resolve phage genomes from fragmented viral metagenome assemblies. Phables identifies phage-like components in the assembly graph, models each component as a flow network, and uses graph algorithms and flow decomposition techniques to identify genomic paths. Experimental results of viral metagenomic samples obtained from different environments show that Phables recovers on average over 49% more high-quality phage genomes compared to existing viral identification tools. Furthermore, Phables can resolve variant phage genomes with over 99% average nucleotide identity, a distinction that existing tools are unable to make. Availability and Implementation Phables is available on GitHub at https://github.com/Vini2/phables. Contact vijini.mallawaarachchi@flinders.edu.au

[1]  R. Jensen,et al.  FastViromeExplorer-Novel: Recovering Draft Genomes of Novel Viruses and Phages in Metagenomic Data , 2023, J. Comput. Biol..

[2]  Yu Lin,et al.  Accurate Binning of Metagenomic Contigs Using Composition, Coverage, and Assembly Graphs , 2022, J. Comput. Biol..

[3]  Alexandru I. Tomescu,et al.  Efficient Minimum Flow Decomposition via Integer Linear Programming , 2022, J. Comput. Biol..

[4]  E. Dinsdale,et al.  Phage Diving: An Exploration of the Carcharhinid Shark Epidermal Virome , 2022, Viruses.

[5]  E. Rocha,et al.  Phage-Plasmids Spread Antibiotic Resistance Genes through Infection and Lysogenic Conversion , 2022, bioRxiv.

[6]  R. Edwards,et al.  Hecatomb: An End-to-End Research Platform for Viral Metagenomics , 2022, bioRxiv.

[7]  Thomas D. Nielsen,et al.  Metagenomic binning with assembly graph embeddings , 2022, bioRxiv.

[8]  Vijini Mallawaarachchi,et al.  RepBin: Constraint-based Graph Representation Learning for Metagenomic Binning , 2021, AAAI.

[9]  Susana Ladra,et al.  ViQUF: De Novo Viral Quasispecies Reconstruction Using Unitig-Based Flow Networks , 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[10]  Martha R. J. Clokie,et al.  INfrastructure for a PHAge REference Database: Identification of Large-Scale Biases in the Current Collection of Cultured Phage Genomes. , 2021, PHAGE.

[11]  Anders F. Andersson,et al.  Evaluating metagenomic assembly approaches for biome-specific gene catalogues , 2021, Microbiome.

[12]  V. Mallawaarachchi,et al.  MetaCoAG: Binning Metagenomic Contigs via Composition, Coverage and Assembly Graphs , 2021, bioRxiv.

[13]  Orkun S. Soyer,et al.  STRONG: metagenomics strain resolution on assembly graphs , 2021, Genome Biology.

[14]  K. Reinert,et al.  Critical Assessment of Metagenome Interpretation: the second round of challenges , 2021, Nature Methods.

[15]  Shiraz A. Shah,et al.  Genome binning of viral entities from bulk metagenomics data , 2021, bioRxiv.

[16]  M. O'Dea,et al.  Evaluating coverage bias in next-generation sequencing of Escherichia coli , 2021, PloS one.

[17]  Natalia N. Ivanova,et al.  Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome , 2021, Nature Microbiology.

[18]  A. Toussaint,et al.  PHROG: families of prokaryotic virus proteins clustered using remote homology , 2021, NAR genomics and bioinformatics.

[19]  R. Edwards,et al.  Philympics 2021: Prophage Predictions Perplex Programs , 2021, bioRxiv.

[20]  Yu Lin,et al.  Improving metagenomic binning results with overlapped bins using assembly graphs , 2021, Algorithms for Molecular Biology.

[21]  M. Touchon,et al.  Bacteria have numerous distinctive groups of phage–plasmids with conserved phage and variable plasmid gene repertoires , 2021, Nucleic acids research.

[22]  S. Rasmussen,et al.  Improved metagenome binning and assembly using deep variational autoencoders , 2021, Nature Biotechnology.

[23]  N. Kyrpides,et al.  CheckV assesses the quality and completeness of metagenome-assembled viral genomes , 2020, Nature Biotechnology.

[24]  A. Luque,et al.  The Missing Tailed Phages: Prediction of Small Capsid Candidates , 2020, Microorganisms.

[25]  Susana Ladra,et al.  Inference of viral quasispecies with a paired de Bruijn graph , 2020, Bioinform..

[26]  Rayan Chikhi,et al.  Metagenomics Strain Resolution on Assembly Graphs , 2020, bioRxiv.

[27]  Leen Stougie,et al.  Strain-Aware Assembly of Genomes from Mixed Samples Using Flow Variation Graphs , 2020, RECOMB.

[28]  Eugene V. Koonin,et al.  Seeker: Alignment-free identification of bacteriophage genomes by deep learning , 2020, bioRxiv.

[29]  Vijini Mallawaarachchi,et al.  GraphBin: refined binning of metagenomic contigs using assembly graphs , 2020, Bioinform..

[30]  Karthik Anantharaman,et al.  VIBRANT: automated recovery, annotation and curation of microbial viruses, and evaluation of viral community function from genomic sequences , 2020, Microbiome.

[31]  Sammie Bae,et al.  Graphs , 2020, Algorithms.

[32]  J. Banfield,et al.  Accurate and complete genomes from metagenomes , 2019, bioRxiv.

[33]  Huaiqiu Zhu,et al.  PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning , 2019, GigaScience.

[34]  Peter F. Stadler,et al.  Ryūtō: network-flow based transcriptome reconstruction , 2019, BMC Bioinformatics.

[35]  Feng Li,et al.  MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies , 2019, PeerJ.

[36]  T. Sutton,et al.  Choice of assembly software has a critical impact on virome characterisation , 2018, Microbiome.

[37]  João C. Setubal,et al.  MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins , 2018, Front. Genet..

[38]  Erik Bongcam-Rudloff,et al.  Simulating Illumina metagenomic data with InSilicoSeq , 2018, Bioinform..

[39]  Yingchao Zhao,et al.  De novo haplotype reconstruction in viral quasispecies using paired-end read guided path finding , 2018, bioRxiv.

[40]  Yu Lin,et al.  Assembly of long, error-prone reads using repeat graphs , 2018, Nature Biotechnology.

[41]  P. Pevzner,et al.  Assembly of Long Error-Prone Reads Using Repeat Graphs , 2018, bioRxiv.

[42]  Adam M. Phillippy,et al.  MUMmer4: A fast and versatile genome alignment system , 2018, PLoS Comput. Biol..

[43]  Carl Kingsford,et al.  Accurate assembly of transcripts through phase-preserving graph decomposition , 2017, Nature Biotechnology.

[44]  Johannes Söding,et al.  MMseqs2: sensitive protein sequence searching for the analysis of massive data sets , 2017, bioRxiv.

[45]  Emiley A. Eloe-Fadrosh,et al.  Benchmarking viromics: an in silico evaluation of metagenome-enabled estimates of viral community composition and diversity , 2017, PeerJ.

[46]  A. Górski,et al.  Bacteriophages in the gastrointestinal tract and their implications , 2017, Gut Pathogens.

[47]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[48]  P. Pevzner,et al.  metaSPAdes: a new versatile metagenomic assembler. , 2017, Genome research.

[49]  Andrew J. Davison,et al.  Consensus statement: Virus taxonomy in the age of metagenomics , 2017, Nature Reviews Microbiology.

[50]  O. Lund,et al.  MetaPhinder—Identifying Bacteriophage Sequences in Metagenomic Data Sets , 2016, PloS one.

[51]  J. Grose,et al.  Software-based analysis of bacteriophage genomes, physical ends, and packaging strategies , 2016, BMC Genomics.

[52]  Alexey A. Gurevich,et al.  MetaQUAST: evaluation of metagenome assemblies , 2016, Bioinform..

[53]  Alexandru I. Tomescu,et al.  Safe and Complete Contig Assembly Via Omnitigs , 2016, RECOMB.

[54]  L. Pritchard,et al.  Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens , 2016 .

[55]  Justin Zobel,et al.  Bandage: interactive visualization of de novo genome assemblies , 2015, bioRxiv.

[56]  Jenny Sauk,et al.  Disease-Specific Alterations in the Enteric Virome in Inflammatory Bowel Disease , 2015, Cell.

[57]  Eric C Keen,et al.  A century of phage research: Bacteriophages and the shaping of modern biology , 2015, BioEssays : news and reviews in molecular, cellular and developmental biology.

[58]  Kunihiko Sadakane,et al.  MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph , 2014, Bioinform..

[59]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[60]  Alexandru I. Tomescu,et al.  A novel min-cost flow method for estimating transcript expression with RNA-Seq , 2013, BMC Bioinformatics.

[61]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[62]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[63]  Robert A. Edwards,et al.  PHACTS, a computational approach to classifying the lifestyle of phages , 2012, Bioinform..

[64]  Ruben E. Valas,et al.  Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage , 2011, The ISME Journal.

[65]  R. Leplae,et al.  A modular view of the bacteriophage genomic space: identification of host and lifestyle marker modules. , 2011, Research in microbiology.

[66]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[67]  Hideaki Tanaka,et al.  MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads , 2011, BCB '11.

[68]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[69]  Anne Bergeron,et al.  Mosaic Graphs and Comparative Genomics in Phage Communities , 2010, J. Comput. Biol..

[70]  Haixu Tang,et al.  FragGeneScan: predicting genes in short and error-prone reads , 2010, Nucleic acids research.

[71]  F. Rohwer,et al.  Explaining microbial population genomics through phage predation , 2009, Nature Reviews Microbiology.

[72]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[73]  D. Lindell,et al.  Exploring the prokaryotic virosphere. , 2008, Research in microbiology.

[74]  Philippe Chrétienne,et al.  Simple bounds and greedy algorithms for decomposing a flow into a minimal set of paths , 2008, Eur. J. Oper. Res..

[75]  F. Studier,et al.  Multiple roles of T7 RNA polymerase and T7 lysozyme during bacteriophage T7 infection. , 2004, Journal of molecular biology.

[76]  Haixu Tang,et al.  De novo repeat classification and fragment assembly , 2004, RECOMB.

[77]  B. Andresen,et al.  Genomic analysis of uncultured marine viral communities , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[78]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[79]  N. Ravin,et al.  The anti‐immunity system of phage‐plasmid N15: identification of the antirepressor gene and its control by a small processed RNA , 1999, Molecular microbiology.

[80]  N. Pace,et al.  Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity , 1998 .

[81]  Philip Hugenholtz,et al.  Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity , 1998, Journal of bacteriology.

[82]  Eugene W. Myers,et al.  Combinatorial algorithms for DNA sequence assembly , 1995, Algorithmica.

[83]  M. Eigen,et al.  Viral quasispecies. , 1993, Scientific American.

[84]  Y. Chung,et al.  Bacteriophage T7 DNA packaging. III. A "hairpin" end formed on T7 concatemers may be an intermediate in the processing reaction. , 1990, Journal of molecular biology.

[85]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[86]  F. Twort AN INVESTIGATION ON THE NATURE OF ULTRA-MICROSCOPIC VIRUSES. , 1915 .

[87]  Vijini Mallawaarachchi,et al.  GraphBin2: Refined and Overlapped Binning of Metagenomic Contigs Using Assembly Graphs , 2020, WABI.

[88]  S. Casjens,et al.  Determining DNA packaging strategy by analysis of the termini of the chromosomes in tailed-bacteriophage virions. , 2009, Methods in molecular biology.

[89]  Graham F Hatfull,et al.  Bacteriophage genomics. , 2008, Current opinion in microbiology.

[90]  R. Edwards,et al.  Viral metagenomics , 2005, Nature Reviews Microbiology.

[91]  F. Sanger,et al.  Nucleotide sequence of bacteriophage phi X174 DNA. , 1977, Nature.