Analysis of Next-generation Sequencing Data in Virology - Opportunities and Challenges

Viruses are the most abundant and the smallest organisms, which are relatively simple to sequence. Genome sequence data of viruses for individual species to populations out‐ number that of other species. Although this offers an opportunity to study viral diversity at varying levels of taxonomic hierarchy, it also poses challenges for systematic and struc‐ tured organization of data and its downstream processing. Extensive computational anal‐ yses using a number of algorithms and programs have opened exciting opportunities for virus discovery and diagnostics, apart from augmenting our understanding of the intri‐ guing world of viruses. Unravelling evolutionary dynamics of viruses permits improved understanding of phenomena such as quasispecies diversity, role of mutations in host switching and drug resistance, which enables the tangible measurements of genotype and phenotype of viruses. Improved understanding of geno-/serotype diversity in corre‐ lation with antigenic diversity will facilitate rational design and development of effica‐ cious vaccines against emerging and re-emerging viruses. Mathematical models developed using the genomic data could be used to predict the spread of viruses due to vector switching and the (re)emergence due to host switching and, thereby, contribute to‐ wards designing public health policies for disease management and control.

[1]  G. Weiller Phylogenetic profiles: a graphical method for detecting genetic recombinations in homologous sequences. , 1998, Molecular biology and evolution.

[2]  Jian Wang,et al.  Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research , 2006, Nucleic Acids Res..

[3]  K. Crandall,et al.  The Effect of Recombination on the Accuracy of Phylogeny Estimation , 2002, Journal of Molecular Evolution.

[4]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[5]  Yiming Bao,et al.  NCBI Viral Genomes Resource , 2014, Nucleic Acids Res..

[6]  E. Bunnik,et al.  Detection of Inferred CCR5- and CXCR4-Using HIV-1 Variants and Evolutionary Intermediates Using Ultra-Deep Pyrosequencing , 2011, PLoS pathogens.

[7]  W. Fitch,et al.  Construction of phylogenetic trees. , 1967, Science.

[8]  S. Sawyer,et al.  Possible emergence of new geminiviruses by frequent recombination. , 1999, Virology.

[9]  Sergei L. Kosakovsky Pond,et al.  Datamonkey 2010: a suite of phylogenetic analysis tools for evolutionary biology , 2010, Bioinform..

[10]  Vincent Moulton,et al.  Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning , 2009, BMC Bioinformatics.

[11]  G. Reyes-Terán,et al.  Deep sequencing: becoming a critical tool in clinical virology. , 2014, Journal of clinical virology : the official publication of the Pan American Society for Clinical Virology.

[12]  Sergei L. Kosakovsky Pond,et al.  Not so different after all: a comparison of methods for detecting amino acid sites under selection. , 2005, Molecular biology and evolution.

[13]  P. Lemey,et al.  Analysing recombination in nucleotide sequences , 2011, Molecular ecology resources.

[14]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[15]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[16]  S. Yerly,et al.  The effect of dose on the safety and immunogenicity of the VSV Ebola candidate vaccine: a randomised double-blind, placebo-controlled phase 1/2 trial. , 2015, The Lancet. Infectious diseases.

[17]  Pandurang Kolekar,et al.  Genotyping of Mumps viruses based on SH gene: Develop- ment of a server using alignment-free and alignment-based methods , 2011 .

[18]  Mark J. Gibbs,et al.  Sister-Scanning: a Monte Carlo procedure for assessing signals in recombinant sequences , 2000, Bioinform..

[19]  Lin Ma,et al.  Discovery of Replicating Circular RNAs by RNA-Seq and Computational Algorithms , 2014, PLoS pathogens.

[20]  R. Barrangou,et al.  CRISPR/Cas, the Immune System of Bacteria and Archaea , 2010, Science.

[21]  J. Hein,et al.  Recombination and the molecular clock. , 2000, Molecular biology and evolution.

[22]  Bernhard Haubold,et al.  LIAN 3.0: detecting linkage disequilibrium in multilocus data , 2000, Bioinform..

[23]  D. Burke,et al.  Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning. , 1995, AIDS research and human retroviruses.

[24]  E. Holmes,et al.  Emergence of a Highly Pathogenic Avian Influenza Virus from a Low-Pathogenic Progenitor , 2014, Journal of Virology.

[25]  Olivier Gascuel,et al.  Fast and Accurate Phylogeny Reconstruction Algorithms Based on the Minimum-Evolution Principle , 2002, WABI.

[26]  John Maynard Smith,et al.  Analyzing the mosaic structure of genes , 1992, Journal of Molecular Evolution.

[27]  Eugene V Koonin,et al.  New dimensions of the virus world discovered through metagenomics. , 2010, Trends in microbiology.

[28]  Nicholas Eriksson,et al.  ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data , 2011, BMC Bioinformatics.

[29]  Amos Bairoch,et al.  ViralZone: a knowledge resource to understand virus diversity , 2010, Nucleic Acids Res..

[30]  Rachel S. G. Sealfon,et al.  Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak , 2014, Science.

[31]  Xiao Yang,et al.  V-Phaser 2: variant inference for viral populations , 2013, BMC Genomics.

[32]  Austin L. Hughes,et al.  SNPGenie: estimating evolutionary parameters to detect natural selection using pooled next-generation sequencing data , 2015, Bioinform..

[33]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[34]  Zen H. Lu,et al.  Beyond the whole genome consensus: Unravelling of PRRSV phylogenomics using next generation sequencing technologies , 2014, Virus research.

[35]  Itai Sharon,et al.  Comparative metagenomics of microbial traits within oceanic viral communities , 2011, The ISME Journal.

[36]  Sergei L. Kosakovsky Pond,et al.  Detecting Individual Sites Subject to Episodic Diversifying Selection , 2012, PLoS genetics.

[37]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[38]  F. Sanger,et al.  Nucleotide sequence of bacteriophage phi X174 DNA. , 1977, Nature.

[39]  Pandurang Kolekar,et al.  Molecular Evolution & Phylogeny: What, When, Why & How? , 2011 .

[40]  T Gojobori,et al.  A method for detecting positive selection at single amino acid sites. , 1999, Molecular biology and evolution.

[41]  R. Chakraborty Analysis of Genetic Structure of Populations: Meaning, Methods, and Implications , 1993 .

[42]  Jiang-feng Du,et al.  Unbiased Parallel Detection of Viral Pathogens in Clinical Samples by Use of a Metagenomic Approach , 2011, Journal of Clinical Microbiology.

[43]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[44]  F. Cao,et al.  AGP: A Multimethods Web Server for Alignment-Free Genome Phylogeny , 2013, Molecular biology and evolution.

[45]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.

[46]  P. Awadalla,et al.  Low linkage disequilibrium indicative of recombination in foot-and-mouth disease virus gene sequence alignments. , 2004, The Journal of general virology.

[47]  K. Crandall,et al.  Evaluation of methods for detecting recombination from DNA sequences: Computer simulations , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Kristine M Wylie,et al.  Virome genomics: a tool for defining the human virome , 2013, Current Opinion in Microbiology.

[49]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[50]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[51]  Sergei L. Kosakovsky Pond,et al.  Datamonkey: rapid detection of selective pressure on individual sites of codon alignments , 2005, Bioinform..

[52]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[53]  T. Thomas,et al.  Deep sequencing of evolving pathogen populations: applications, errors, and bioinformatic solutions , 2014, Microbial Informatics and Experimentation.

[54]  Toshihisa Takagi,et al.  The DDBJ Japanese Genotype-phenotype Archive for genetic and phenotypic human data , 2014, Nucleic Acids Res..

[55]  Daniel J. Nasko,et al.  Counts and sequences, observations that continue to change our understanding of viruses in nature , 2015, Journal of Microbiology.

[56]  J. Baross,et al.  Using CRISPRs as a metagenomic tool to identify microbial hosts of a diffuse flow hydrothermal vent viral assemblage. , 2011, FEMS microbiology ecology.

[57]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[58]  J. Derisi,et al.  PRICE: Software for the Targeted Assembly of Components of (Meta) Genomic Sequence Data , 2013, G3: Genes, Genomes, Genetics.

[59]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[60]  Pandurang Kolekar,et al.  WNV Typer: a server for genotyping of West Nile viruses using an alignment-free method based on a return time distribution. , 2014, Journal of virological methods.

[61]  F. Balloux,et al.  The population genomics of hepatitis B virus , 2007, Molecular ecology.

[62]  M. Suchard,et al.  Bayesian Phylogenetics with BEAUti and the BEAST 1.7 , 2012, Molecular biology and evolution.

[63]  R. Scheuermann,et al.  Virus Pathogen Database and Analysis Resource (ViPR): A Comprehensive Bioinformatics Database and Analysis Resource for the Coronavirus Research Community , 2012, Viruses.

[64]  K. Metzner,et al.  Low-Frequency HIV-1 Drug Resistance Mutations and Risk of NNRTI-Based Antiretroviral Treatment Failure , 2011 .

[65]  E. Holmes,et al.  Deep sequencing reveals persistence of intra- and inter-host genetic diversity in natural and greenhouse populations of zucchini yellow mosaic virus. , 2012, The Journal of general virology.

[66]  Masato Tashiro,et al.  Characterization of Quasispecies of Pandemic 2009 Influenza A Virus (A/H1N1/2009) by De Novo Sequencing Using a Next-Generation DNA Sequencer , 2010, PloS one.

[67]  M. Breitbart,et al.  Exploring the viral world through metagenomics. , 2011, Current opinion in virology.

[68]  John E. Johnson,et al.  Discovery of functional genomic motifs in viruses with ViReMa–a Virus Recombination Mapper–for analysis of next-generation sequencing data , 2013, Nucleic acids research.

[69]  David Posada,et al.  An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets , 2007, Genetics.

[70]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[71]  Vincent Moulton,et al.  RDP3: a flexible and fast computer program for analyzing recombination , 2010, Bioinform..

[72]  Tatiana A. Tatusova,et al.  A web-based genotyping resource for viral sequences , 2004, Nucleic Acids Res..

[73]  Matthias Scheuch,et al.  DNase SISPA-Next Generation Sequencing Confirms Schmallenberg Virus in Belgian Field Samples and Identifies Genetic Variation in Europe , 2012, PloS one.

[74]  Christopher Dye,et al.  WHO and the future of disease control programmes , 2013, The Lancet.

[75]  Urmila Kulkarni-Kale,et al.  VirGen: a comprehensive viral genome resource , 2004, Nucleic Acids Res..

[76]  K. Crandall,et al.  A modified bootscan algorithm for automated identification of recombinant sequences and recombination breakpoints. , 2005, AIDS research and human retroviruses.

[77]  Pandurang Kolekar,et al.  Alignment-free distance measure based on return time distribution for sequence analysis: applications to clustering, molecular phylogeny and subtyping. , 2012, Molecular phylogenetics and evolution.

[78]  Frances M. G. Pearl,et al.  VIDA: a virus database system for the organization of animal virus genome open reading frames , 2001, Nucleic Acids Res..

[79]  B. Rannala,et al.  Molecular phylogenetics: principles and practice , 2012, Nature Reviews Genetics.

[80]  Pablo Librado,et al.  DnaSP v5: a software for comprehensive analysis of DNA polymorphism data , 2009, Bioinform..

[81]  Guoqing Lu,et al.  FluGenome: a web tool for genotyping influenza A virus , 2007, Nucleic Acids Res..

[82]  M. Roossinck,et al.  Plant virus metagenomics: what we know and why we need to know more , 2014, Front. Plant Sci..

[83]  E. Holmes,et al.  Why do RNA viruses recombine? , 2011, Nature Reviews Microbiology.

[84]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[85]  Marc Eloit,et al.  The human virome: new tools and concepts , 2013, Trends in Microbiology.

[86]  Elizabeth M. Ryan,et al.  De novo assembly of highly diverse viral populations , 2012, BMC Genomics.

[87]  Karina Yusim,et al.  The hepatitis C sequence database in Los Alamos , 2007, Nucleic Acids Res..

[88]  Darren Martin,et al.  RDP: detection of recombination amongst aligned sequences , 2000, Bioinform..

[89]  R. Kierzek,et al.  How RNA viruses exchange their genetic material. , 2001, Acta biochimica Polonica.

[90]  Laurent Excoffier,et al.  Arlequin (version 3.0): An integrated software package for population genetics data analysis , 2005, Evolutionary bioinformatics online.

[91]  Yinan Wan,et al.  VirAmp: a galaxy-based viral genome assembly pipeline , 2015, GigaScience.

[92]  X. de Lamballerie,et al.  Next generation sequencing of viral RNA genomes , 2013, BMC Genomics.

[93]  Mattia C. F. Prosperi,et al.  QuRe: software for viral quasispecies reconstruction from next-generation sequencing data , 2012, Bioinform..

[94]  Heng Li,et al.  A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data , 2011, Bioinform..

[95]  M. V. Regenmortel,et al.  Virus taxonomy: classification and nomenclature of viruses. Seventh report of the International Committee on Taxonomy of Viruses. , 2000 .

[96]  K. Metzner The significance of minority drug-resistant quasispecies , 2006 .

[97]  Chris Upton,et al.  Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes , 2000, Bioinform..

[98]  E. Koonin,et al.  A virocentric perspective on the evolution of life , 2013, Current Opinion in Virology.

[99]  Gráinne McGuire,et al.  TOPAL 2.0: improved detection of mosaic sequences within multiple alignments , 2000, Bioinform..

[100]  M. Schatz,et al.  Genome assembly forensics: finding the elusive mis-assembly , 2008, Genome Biology.

[101]  E. Holmes,et al.  Phylogenetic evidence for recombination in dengue virus. , 1999, Molecular biology and evolution.

[102]  Xiu Lin,et al.  Facing growth in the European Nucleotide Archive , 2012, Nucleic Acids Res..

[103]  Thierry Candresse,et al.  Finding and identifying the viral needle in the metagenomic haystack: trends and challenges , 2015, Front. Microbiol..

[104]  Rino Rappuoli,et al.  Vaccines, emerging viruses, and how to avoid disaster , 2014, BMC Biology.

[105]  Vaishali P. Waman,et al.  Genome to Vaccinome: Role of Bioinformatics, Immunoinformatics & Comparative Genomics , 2012 .

[106]  G. Evanno,et al.  Detecting the number of clusters of individuals using the software structure: a simulation study , 2005, Molecular ecology.

[107]  Volker Roth,et al.  Probabilistic Inference of Viral Quasispecies Subject to Recombination , 2012, RECOMB.

[108]  Jeroen Aerssens,et al.  VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering , 2015, Bioinform..

[109]  M. Eigen Selforganization of matter and the evolution of biological macromolecules , 1971, Naturwissenschaften.

[110]  Astrid Gall,et al.  IVA: accurate de novo assembly of RNA virus genomes , 2015, Bioinform..

[111]  Prasert Auewarakul,et al.  Viral evolution and transmission effectiveness. , 2012, World journal of virology.

[112]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[113]  M. Stephens,et al.  Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. , 2003, Genetics.

[114]  Deepak Sharma,et al.  Unraveling the Web of Viroinformatics: Computational Tools and Databases in Virus Research , 2014, Journal of Virology.

[115]  Hideaki Sugawara,et al.  Genome Information Broker for Viruses (GIB-V): database for comparative analysis of virus genomes , 2006, Nucleic Acids Res..

[116]  Raul Andino,et al.  Mutational and fitness landscapes of an RNA virus revealed through population sequencing , 2013, Nature.