VirSorter: mining viral signal from microbial genomic data

Viruses of microbes impact all ecosystems where microbes drive key energy and substrate transformations including the oceans, humans and industrial fermenters. However, despite this recognized importance, our understanding of viral diversity and impacts remains limited by too few model systems and reference genomes. One way to fill these gaps in our knowledge of viral diversity is through the detection of viral signal in microbial genomic data. While multiple approaches have been developed and applied for the detection of prophages (viral genomes integrated in a microbial genome), new types of microbial genomic data are emerging that are more fragmented and larger scale, such as Single-cell Amplified Genomes (SAGs) of uncultivated organisms or genomic fragments assembled from metagenomic sequencing. Here, we present VirSorter, a tool designed to detect viral signal in these different types of microbial sequence data in both a reference-dependent and reference-independent manner, leveraging probabilistic models and extensive virome data to maximize detection of novel viruses. Performance testing shows that VirSorter’s prophage prediction capability compares to that of available prophage predictors for complete genomes, but is superior in predicting viral sequences outside of a host genome (i.e., from extrachromosomal prophages, lytic infections, or partially assembled prophages). Furthermore, VirSorter outperforms existing tools for fragmented genomic and metagenomic datasets, and can identify viral signal in assembled sequence (contigs) as short as 3kb, while providing near-perfect identification (>95% Recall and 100% Precision) on contigs of at least 10kb. Because VirSorter scales to large datasets, it can also be used in “reverse” to more confidently identify viral sequence in viral metagenomes by sorting away cellular DNA whether derived from gene transfer agents, generalized transduction or contamination. Finally, VirSorter is made available through the iPlant Cyberinfrastructure that provides a web-based user interface interconnected with the required computing resources. VirSorter thus complements existing prophage prediction softwares to better leverage fragmented, SAG and metagenomic datasets in a way that will scale to modern sequencing. Given these features, VirSorter should enable the discovery of new viruses in microbial datasets, and further our understanding of uncultivated viral communities across diverse ecosystems.

[1]  Katherine H. Huang,et al.  Phage auxiliary metabolic genes and the redirection of cyanobacterial host carbon metabolism. , 2011, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Itoh,et al.  MetaGeneAnnotator: Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene Prediction in Anonymous Prokaryotic and Phage Genomes , 2008, DNA research : an international journal for rapid publication of reports on genes and genomes.

[3]  Sallie W. Chisholm,et al.  Photosynthesis genes in marine viruses yield proteins during host infection , 2005, Nature.

[4]  S. Hallam,et al.  Metabolic reprogramming by viruses in the sunlit and dark ocean , 2013, Genome Biology.

[5]  E. Boyd,et al.  Bacteriophage-encoded bacterial virulence factors and phage-pathogenicity island interactions. , 2012, Advances in virus research.

[6]  David S. Wishart,et al.  PHAST: A Fast Phage Search Tool , 2011, Nucleic Acids Res..

[7]  Frederic D Bushman,et al.  Rapid evolution of the human gut virome , 2013, Proceedings of the National Academy of Sciences.

[8]  Franklin L. Nobrega,et al.  Revisiting phage therapy: new applications for old resources. , 2015, Trends in microbiology.

[9]  Sergey Koren,et al.  Single-cell genomics-based analysis of virus–host interactions in marine surface bacterioplankton , 2015, The ISME Journal.

[10]  F. Rohwer,et al.  Viruses manipulate the marine environment , 2009, Nature.

[11]  P. Hugenholtz,et al.  Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes , 2013, Nature Biotechnology.

[12]  P. Bork,et al.  Patterns and ecological drivers of ocean viral communities , 2015, Science.

[13]  A. Letarov,et al.  The bacteriophages in human‐ and animal body‐associated microbial communities , 2009, Journal of applied microbiology.

[14]  K. Wommack,et al.  Virioplankton: Viruses in Aquatic Ecosystems , 2000, Microbiology and Molecular Biology Reviews.

[15]  E. Koonin,et al.  The ancient Virus World and evolution of cells , 2006, Biology Direct.

[16]  D. Relman,et al.  Evidence of a robust resident bacteriophage population revealed through analysis of the human salivary virome , 2011, The ISME Journal.

[17]  R. Stepanauskas,et al.  Single-Cell Genomics Reveals Organismal Interactions in Uncultivated Marine Protists , 2011, Science.

[18]  N. Kashtan,et al.  Single-Cell Genomics Reveals Hundreds of Coexisting Subpopulations in Wild Prochlorococcus , 2014, Science.

[19]  Eugene V Koonin,et al.  Contribution of phage-derived genomic islands to the virulence of facultative bacterial pathogens. , 2013, Environmental microbiology.

[20]  Forest Rohwer,et al.  Going viral: next-generation sequencing applied to phage populations in the human gut , 2012, Nature Reviews Microbiology.

[21]  Luke R Thompson,et al.  Prevalence and Evolution of Core Photosystem II Genes in Marine Cyanobacterial Viruses and Their Hosts , 2006, PLoS biology.

[22]  Chaochun Wei,et al.  NeSSM: A Next-Generation Sequencing Simulator for Metagenomics , 2013, PloS one.

[23]  Natalia N. Ivanova,et al.  Insights into the phylogeny and coding potential of microbial dark matter , 2013, Nature.

[24]  Karthik Anantharaman,et al.  Sulfur Oxidation Genes in Diverse Deep-Sea Viruses , 2014, Science.

[25]  J. Fuhrman Marine viruses and their biogeochemical and ecological effects , 1999, Nature.

[26]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[27]  B. S. Manjunath,et al.  The iPlant Collaborative: Cyberinfrastructure for Plant Biology , 2011, Front. Plant Sci..

[28]  Bonnie L Hurwitz,et al.  Depth-stratified functional and taxonomic niche specialization in the ‘core’ and ‘flexible’ Pacific Ocean Virome , 2014, The ISME Journal.

[29]  S. Salzberg,et al.  Using MUMmer to Identify Similar Regions in Large Sequence Sets , 2003, Current protocols in bioinformatics.

[30]  R. Edwards,et al.  Viral metagenomics , 2005, Nature Reviews Microbiology.

[31]  François Enault,et al.  Assessment of viral community functional potential from viral metagenomes may be hampered by contamination with cellular sequences , 2013, Open Biology.

[32]  F. Rohwer,et al.  Explaining microbial population genomics through phage predation , 2009, Nature Reviews Microbiology.

[33]  D. Sobral,et al.  Requirement for highly efficient pre-mRNA splicing during Drosophila early embryonic development , 2014, eLife.

[34]  S. Casjens,et al.  Prophages and bacterial genomics: what have we learned so far? , 2003, Molecular microbiology.

[35]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[36]  Forest Rohwer,et al.  Here a virus, there a virus, everywhere the same virus? , 2005, Trends in microbiology.

[37]  D. Fouts Phage_Finder: Automated identification and classification of prophage regions in complete bacterial genome sequences , 2006, Nucleic acids research.

[38]  Curtis A. Suttle,et al.  Exploring the Vast Diversity of Marine Viruses , 2007 .

[39]  Eric C. Rouchka,et al.  Proceedings of the Thirteenth Annual UT- KBRIN Bioinformatics Summit 2014 , 2014, BMC Bioinformatics.

[40]  Curtis A. Suttle,et al.  Cyanophages and Their Role in the Ecology of Cyanobacteria , 2000 .

[41]  Matthew K. Waldor,et al.  Lysogenic Conversion by a Filamentous Phage Encoding Cholera Toxin , 1996, Science.

[42]  H. Lester,et al.  Potential for Chemolithoautotrophy Among Ubiquitous Bacteria Lineages in the Dark Ocean , 2011 .

[43]  I. Koike,et al.  Abundance of viruses in deep oceanic waters , 1996 .

[44]  Brian C. Thomas,et al.  Dynamic Viral Populations in Hypersaline Systems as Revealed by Metagenomic Assembly , 2012, Applied and Environmental Microbiology.

[45]  Itai Sharon,et al.  Comparative metagenomics of microbial traits within oceanic viral communities , 2011, The ISME Journal.

[46]  Matthew B. Sullivan,et al.  Rising to the challenge: accelerated pace of discovery transforms marine virology , 2015, Nature Reviews Microbiology.

[47]  Julian Parkhill,et al.  Newly introduced genomic prophage islands are critical determinants of in vivo competitiveness in the Liverpool Epidemic Strain of Pseudomonas aeruginosa. , 2008, Genome research.

[48]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[49]  Ron Y. Pinter,et al.  Photosystem I gene cassettes are present in marine virus genomes , 2009, Nature.

[50]  C. Suttle Marine viruses — major players in the global ecosystem , 2007, Nature Reviews Microbiology.

[51]  Ghislain Fournous,et al.  The impact of prophages on bacterial chromosomes , 2004, Molecular microbiology.

[52]  Jacques van Helden,et al.  Prophinder: a computational tool for prophage prediction in prokaryotic genomes , 2008, Bioinform..

[53]  A. So,et al.  Tackling antibiotic resistance , 2010, BMJ : British Medical Journal.

[54]  Alexander Sczyrba,et al.  Single-cell genomics reveals complex carbohydrate degradation patterns in poribacterial symbionts of marine sponges , 2013, The ISME Journal.

[55]  J. Banfield,et al.  De novo metagenomic assembly reveals abundant novel major lineage of Archaea in hypersaline microbial communities , 2011, The ISME Journal.

[56]  François Enault,et al.  Metavir 2: new tools for viral metagenome comparison and assembled virome analysis , 2014, BMC Bioinformatics.

[57]  S. Giovannoni,et al.  The uncultured microbial majority. , 2003, Annual review of microbiology.

[58]  Robert A. Edwards,et al.  PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies , 2012, Nucleic acids research.

[59]  Christina Backes,et al.  An integer linear programming approach for finding deregulated subgraphs in regulatory networks , 2011, Nucleic acids research.

[60]  S. Hallam,et al.  Ecology and evolution of viruses infecting uncultivated SUP05 bacteria as revealed by single-cell- and meta-genomics , 2014, eLife.

[61]  M. Weinbauer Ecology of prokaryotic viruses. , 2004, FEMS microbiology reviews.