VIGOR, an annotation program for small viral genomes

BackgroundThe decrease in cost for sequencing and improvement in technologies has made it easier and more common for the re-sequencing of large genomes as well as parallel sequencing of small genomes. It is possible to completely sequence a small genome within days and this increases the number of publicly available genomes. Among the types of genomes being rapidly sequenced are those of microbial and viral genomes responsible for infectious diseases. However, accurate gene prediction is a challenge that persists for decoding a newly sequenced genome. Therefore, accurate and efficient gene prediction programs are highly desired for rapid and cost effective surveillance of RNA viruses through full genome sequencing.ResultsWe have developed VIGOR (Viral Genome ORF Reader), a web application tool for gene prediction in influenza virus, rotavirus, rhinovirus and coronavirus subtypes. VIGOR detects protein coding regions based on sequence similarity searches and can accurately detect genome specific features such as frame shifts, overlapping genes, embedded genes, and can predict mature peptides within the context of a single polypeptide open reading frame. Genotyping capability for influenza and rotavirus is built into the program. We compared VIGOR to previously described gene prediction programs, ZCURVE_V, GeneMarkS and FLAN. The specificity and sensitivity of VIGOR are greater than 99% for the RNA viral genomes tested.ConclusionsVIGOR is a user friendly web-based genome annotation program for five different viral agents, influenza, rotavirus, rhinovirus, coronavirus and SARS coronavirus. This is the first gene prediction program for rotavirus and rhinovirus for public access. VIGOR is able to accurately predict protein coding genes for the above five viral types and has the capability to assign function to the predicted open reading frames and genotype influenza virus. The prediction software was designed for performing high throughput annotation and closure validation in a post-sequencing production pipeline.

[1]  Jelle Matthijnssens,et al.  Evolutionary Dynamics of Human Rotaviruses: Balancing Reassortment with Preferred Genome Constellations , 2009, PLoS pathogens.

[2]  Daniel Janies,et al.  Complete genomic sequences, a key residue in the spike protein and deletions in nonstructural protein 3b of US strains of the virulent and attenuated coronaviruses, transmissible gastroenteritis virus and porcine respiratory coronavirus , 2006, Virology.

[3]  M. Borodovsky,et al.  Heuristic approach to deriving models for gene finding. , 1999, Nucleic acids research.

[4]  Jelle Matthijnssens,et al.  Full Genome-Based Classification of Rotaviruses Reveals a Common Origin between Human Wa-Like and Porcine Rotavirus Strains and Human DS-1-Like and Bovine Rotavirus Strains , 2008, Journal of Virology.

[5]  Jonathan W. Yewdell,et al.  A novel influenza A virus mitochondrial protein that induces cell death , 2001, Nature Medicine.

[6]  BMC Bioinformatics , 2005 .

[7]  M. Borodovsky,et al.  Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. , 1994, Nucleic acids research.

[8]  E. Mardis,et al.  Genome-wide diversity and selective pressure in the human rhinovirus , 2007, Virology Journal.

[9]  Feng-Biao Guo,et al.  ZCURVE_V: a new self-training system for recognizing protein-coding genes in viral and phage genomes , 2006, BMC Bioinformatics.

[10]  Ole Lund,et al.  Coronavirus 3CLpro proteinase cleavage sites: Possible relevance to SARS virus pathology , 2004, BMC Bioinformatics.

[11]  L Döhner,et al.  [Genetics of influenza viruses]. , 1978, Archiv fur experimentelle Veterinarmedizin.

[12]  Ralph S. Baric,et al.  Processing of Open Reading Frame 1a Replicase Proteins nsp7 to nsp10 in Murine Hepatitis Virus Strain A59 Replication , 2007, Journal of Virology.

[13]  Anastasia Vlasova,et al.  Biologic, Antigenic, and Full-Length Genomic Characterization of a Bovine-Like Coronavirus Isolated from a Giraffe , 2007, Journal of Virology.

[14]  David Spiro,et al.  Bovine-Like Coronaviruses Isolated from Four Species of Captive Wild Ruminants Are Homologous to Bovine Coronaviruses, Based on Complete Genomic Sequences , 2006, Journal of Virology.

[15]  Yoshihiro Kawaoka,et al.  Influenza: lessons from past pandemics, warnings from current incidents , 2005, Nature Reviews Microbiology.

[16]  N. Sonenberg,et al.  Human Rhinovirus 2A Proteinase Cleavage Sites in Eukaryotic Initiation Factors (eIF) 4GI and eIF4GII Are Different , 2003, Journal of Virology.

[17]  Rachel L. Graham,et al.  Replication of Murine Hepatitis Virus Is Regulated by Papain-Like Proteinase 1 Processing of Nonstructural Proteins 1, 2, and 3 , 2006, Journal of Virology.

[18]  Daniel Janies,et al.  Quasispecies of bovine enteric and respiratory coronaviruses based on complete genome sequences and genetic changes after tissue culture adaptation , 2007, Virology.

[19]  J. Ziebuhr,et al.  Molecular biology of severe acute respiratory syndrome coronavirus , 2004, Current Opinion in Microbiology.

[20]  I. Brierley,et al.  Programmed -1 ribosomal frameshifting in the SARS coronavirus. , 2004, Biochemical Society transactions.

[21]  Chuan Qin,et al.  Preparation and characterization of SARS in-house reference antiserum , 2005, Vaccine.

[22]  Mark Borodovsky,et al.  Improving gene annotation of complete viral genomes , 2003, Nucleic acids research.

[23]  Tatiana A. Tatusova,et al.  FLAN: a web server for influenza virus genome annotation , 2007, Nucleic Acids Res..

[24]  David Spiro,et al.  Sequencing and Analyses of All Known Human Rhinovirus Genomes Reveal Structure and Evolution , 2009, Science.

[25]  G. Kang,et al.  Pathogenesis of rotavirus gastroenteritis. , 2001, Novartis Foundation symposium.

[26]  Stephen M. Mount,et al.  A catalogue of splice junction sequences. , 1982, Nucleic acids research.

[27]  Ren Zhang,et al.  ZCURVE_CoV: a new system to recognize protein coding genes in coronavirus genomes, and its applications in analyzing SARS-CoV genomes , 2003, Biochemical and Biophysical Research Communications.

[28]  Jonathan H. Epstein,et al.  Bats Are Natural Reservoirs of SARS-Like Coronaviruses , 2005, Science.

[29]  Debasis Dash,et al.  Recognition and analysis of protein-coding genes in severe acute respiratory syndrome associated coronavirus , 2004, Bioinform..