Curation of viral genomes: challenges, applications and the way forward

BackgroundWhole genome sequence data is a step towards generating the 'parts list' of life to understand the underlying principles of Biocomplexity. Genome sequencing initiatives of human and model organisms are targeted efforts towards understanding principles of evolution with an application envisaged to improve human health. These efforts culminated in the development of dedicated resources. Whereas a large number of viral genomes have been sequenced by groups or individuals with an interest to study antigenic variation amongst strains and species. These independent efforts enabled viruses to attain the status of 'best-represented taxa' with the highest number of genomes. However, due to lack of concerted efforts, viral genomic sequences merely remained as entries in the public repositories until recently.ResultsVirGen is a curated resource of viral genomes and their analyses. Since its first release, it has grown both in terms of coverage of viral families and development of new modules for annotation and analysis. The current release (2.0) includes data for twenty-five families with broad host range as against eight in the first release. The taxonomic description of viruses in VirGen is in accordance with the ICTV nomenclature. A well-characterised strain is identified as a 'representative entry' for every viral species. This non-redundant dataset is used for subsequent annotation and analyses using sequenced-based Bioinformatics approaches. VirGen archives precomputed data on genome and proteome comparisons. A new data module that provides structures of viral proteins available in PDB has been incorporated recently. One of the unique features of VirGen is predicted conformational and sequential epitopes of known antigenic proteins using in-house developed algorithms, a step towards reverse vaccinology.ConclusionStructured organization of genomic data facilitates use of data mining tools, which provides opportunities for knowledge discovery. One of the approaches to achieve this goal is to carry out functional annotations using comparative genomics. VirGen, a comprehensive viral genome resource that serves as an annotation and analysis pipeline has been developed for the curation of public domain viral genome data http://bioinfo.ernet.in/virgen/virgen.html. Various steps in the curation and annotation of the genomic data and applications of the value-added derived data are substantiated with case studies.

[1]  Amos Bairoch,et al.  Recent improvements to the PROSITE database , 2004, Nucleic Acids Res..

[2]  Sonia Longhi,et al.  Structural genomics of the SARS coronavirus: cloning, expression, crystallization and preliminary crystallographic study of the Nsp9 protein , 2003, Acta crystallographica. Section D, Biological crystallography.

[3]  A. Barrett,et al.  Molecular differences between wild-type Japanese encephalitis virus strains of high and low mouse neuroinvasiveness. , 1996, The Journal of general virology.

[4]  T. T. Wu,et al.  AN ANALYSIS OF THE SEQUENCES OF THE VARIABLE REGIONS OF BENCE JONES PROTEINS AND MYELOMA LIGHT CHAINS AND THEIR IMPLICATIONS FOR ANTIBODY COMPLEMENTARITY , 1970, The Journal of experimental medicine.

[5]  G. N. Ramachandran,et al.  Conformation of polypeptides and proteins. , 1968, Advances in protein chemistry.

[6]  Pierre Baldi,et al.  Structural proteomics of the poxvirus family , 2004, Artif. Intell. Medicine.

[7]  Catherine Brooksbank,et al.  The European Bioinformatics Institute's data resources: towards systems biology , 2004, Nucleic Acids Res..

[8]  Detlef D. Leipe,et al.  National Center for Biotechnology Information Viral Genomes Project , 2004, Journal of Virology.

[9]  M. Sippl Recognition of errors in three‐dimensional structures of proteins , 1993, Proteins.

[10]  David P. Mindell,et al.  Don't Forget About Viruses , 2003, Science.

[11]  M. Crabtree,et al.  Genetic and phenotypic characterization of the newly described insect flavivirus, Kamiti River virus , 2003, Archives of Virology.

[12]  W. Kabsch,et al.  Dictionary of protein secondary structure: Pattern recognition of hydrogen‐bonded and geometrical features , 1983, Biopolymers.

[13]  C. Ryschkewitsch,et al.  Phylogenetic analysis of 22 complete genomes of the human polyomavirus JC virus. , 1998, The Journal of general virology.

[14]  Jadwiga Nitkiewicz [Molecular epidemiology of chronic hepatitis C (HCV) virus]. , 2004, Przeglad epidemiologiczny.

[15]  R J Fletterick,et al.  Evidence that the N-terminal domain of nonstructural protein NS3 from yellow fever virus is a serine protease responsible for site-specific cleavages in the viral polyprotein. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[16]  K. Lole,et al.  Comparison of Hepatitis C Virus Genotyping by 5′ Noncoding Region- and Core-Based Reverse Transcriptase PCR Assay with Sequencing and Use of the Assay for Determining Subtype Distribution in India , 2003, Journal of Clinical Microbiology.

[17]  C. Yang,et al.  Processing of Japanese encephalitis virus non-structural proteins: NS2B-NS3 complex and heterologous proteases. , 1995, The Journal of general virology.

[18]  J. H. Strauss,et al.  Mutagenesis of the RGD motif in the yellow fever virus 17D envelope protein. , 1999, Virology.

[19]  P. Weber,et al.  Molecular views of viral polyprotein processing revealed by the crystal structure of the hepatitis C virus bifunctional protease-helicase. , 1999, Structure.

[20]  Ying Li,et al.  Complex of NS3 protease and NS4A peptide of BK strain hepatitis C virus: A 2.2 Å resolution structure in a hexagonal crystal form , 1998, Protein science : a publication of the Protein Society.

[21]  H. Romero,et al.  Evidence of intratypic recombination in natural populations of hepatitis C virus. , 2004, The Journal of general virology.

[22]  Urmila Kulkarni-Kale,et al.  Prediction of 3D structure of envelope glycoprotein of Sri Lanka strain of Japanese encephalitis virus , 2003 .

[23]  Anne-Mieke Vandamme,et al.  Hepatitis C Virus Evolutionary Patterns Studied Through Analysis of Full-Genome Sequences , 2002, Journal of Molecular Evolution.

[24]  J. Roehrig,et al.  Synthetic peptides derived from the deduced amino acid sequence of the E-glycoprotein of Murray Valley encephalitis virus elicit antiviral antibody. , 1989, Virology.

[25]  L. Cattolico,et al.  Genome Sequence of a Polydnavirus: Insights into Symbiotic Virus Evolution , 2004, Science.

[26]  Urmila Kulkarni-Kale,et al.  CEP: a conformational epitope prediction server , 2005, Nucleic Acids Res..

[27]  Y. Modis,et al.  A ligand-binding pocket in the dengue virus envelope glycoprotein , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[28]  R. Contreras,et al.  Complete nucleotide sequence of SV40 DNA , 1978, Nature.

[29]  Jeannette Adu-Bobie,et al.  The genome revolution in vaccine research. , 2004, Current issues in molecular biology.

[30]  Jean-Michel Claverie,et al.  Response to Comment on "The 1.2-Megabase Genome Sequence of Mimivirus" , 2005, Science.

[31]  T. Harada,et al.  A Cellular J-Domain Protein Modulates Polyprotein Processing and Cytopathogenicity of a Pestivirus , 2001, Journal of Virology.

[32]  J. Handelsman,et al.  Metagenomics: genomic analysis of microbial communities. , 2004, Annual review of genetics.

[33]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[34]  Urmila Kulkarni-Kale,et al.  Prediction of 3 D Structure of Envelope Glycoprotein of Sri Lanka Strain of Japanese Encephalitis Virus , 2002 .

[35]  R. Bartenschlager,et al.  Kinetic and structural analyses of hepatitis C virus polyprotein processing , 1994, Journal of virology.

[36]  R C Weir,et al.  Host cell selection of Murray Valley encephalitis virus variants altered at an RGD sequence in the envelope protein and in mouse virulence. , 1990, Virology.

[37]  R. Padmanabhan,et al.  Dengue Virus NS3 Serine Protease , 1999, The Journal of Biological Chemistry.

[38]  G. Bratbak,et al.  High abundance of viruses found in aquatic environments , 1989, Nature.

[39]  C. Lai,et al.  Mutational analysis of a neutralization epitope on the dengue type 2 virus (DEN2) envelope protein: monoclonal antibody resistant DEN2/DEN4 chimeras exhibit reduced mouse neurovirulence. , 1996, Virology.

[40]  P. Tongaonkar,et al.  A semi‐empirical method for prediction of antigenic determinants on protein antigens , 1990, FEBS letters.

[41]  Martin J. Stoermer,et al.  Site-directed Mutagenesis and Kinetic Studies of the West Nile Virus NS3 Protease Identify Key Enzyme-Substrate Interactions* , 2005, Journal of Biological Chemistry.

[42]  R. Compans,et al.  Upregulation of signalase processing and induction of prM-E secretion by the flavivirus NS2B-NS3 protease: roles of protease components , 1997, Journal of virology.

[43]  T. Ishikawa,et al.  Flaviviruses , 2005, Perspectives in Medical Virology.

[44]  C. Lai,et al.  Both nonstructural proteins NS2B and NS3 are required for the proteolytic processing of dengue virus nonstructural proteins , 1991, Journal of virology.

[45]  Qing Zhang,et al.  The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema , 2004, Nucleic Acids Res..

[46]  J. Claverie Viruses take center stage in cellular evolution , 2006, Genome Biology.

[47]  Martin Pelchat,et al.  The Subviral RNA Database: a toolbox for viroids, the hepatitis delta virus and satellite RNAs research , 2006, BMC Microbiology.

[48]  X. de Lamballerie,et al.  Identification and enzymatic characterization of NS2B-NS3 protease of Alkhurma virus, a class-4 flavivirus. , 2005, Virus research.

[49]  R. Doolittle,et al.  A simple method for displaying the hydropathic character of a protein. , 1982, Journal of molecular biology.

[50]  Rodrigo Lopez,et al.  Multiple sequence alignment with the Clustal series of programs , 2003, Nucleic Acids Res..

[51]  F Tsuda,et al.  The entire nucleotide sequences of three hepatitis C virus isolates in genetic groups 7-9 and comparison with those in the other eight genetic groups. , 1998, The Journal of general virology.

[52]  Forest Rohwer,et al.  Here a virus, there a virus, everywhere the same virus? , 2005, Trends in microbiology.

[53]  J. Thornton,et al.  PROCHECK: a program to check the stereochemical quality of protein structures , 1993 .

[54]  C. Craik,et al.  Evolutionary Divergence of Substrate Specificity within the Chymotrypsin-like Serine Protease Fold* , 1997, The Journal of Biological Chemistry.

[55]  Ken Draper,et al.  Single Mutation in the Flavivirus Envelope Protein Hinge Region Increases Neurovirulence for Mice and Monkeys but Decreases Viscerotropism for Monkeys: Relevance to Development and Safety Testing of Live, Attenuated Vaccines , 2002, Journal of Virology.

[56]  J. H. Strauss,et al.  Flavivirus enzyme-substrate interactions studied with chimeric proteinases: identification of an intragenic locus important for substrate recognition , 1991, Journal of virology.

[57]  Karina Yusim,et al.  The Los Alamos hepatitis C sequence database , 2005, Bioinform..

[58]  Deepti D Deobagkar,et al.  Mapping antigenic diversity and strain specificity of mumps virus: a bioinformatics approach. , 2007, Virology.

[59]  Urmila Kulkarni-Kale,et al.  VirGen: a comprehensive viral genome resource , 2004, Nucleic Acids Res..

[60]  F. Heinz,et al.  Epitope mapping of flavivirus glycoproteins. , 1986, Advances in virus research.

[61]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[62]  J B Nousbaum [Genomic subtypes of hepatitis C virus: epidemiology, diagnosis and clinical consequences]. , 1998, Bulletin de la Societe de pathologie exotique.

[63]  R. Webster,et al.  Methods in immunochemistry of viruses. 3. Simple techniques for labelling antibodies with 131-I and 35S. , 1961, The Australian journal of experimental biology and medical science.

[64]  G. Steger,et al.  Nucleotide sequence predicts circularity and self-cleavage of 300-ribonucleotide satellite of arabis mosaic virus. , 1988, Biochemical and biophysical research communications.

[65]  Frances M. G. Pearl,et al.  VIDA: a virus database system for the organization of animal virus genome open reading frames , 2001, Nucleic Acids Res..

[66]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[67]  M. Karplus,et al.  Stochastic boundary conditions for molecular dynamics simulations of ST2 water , 1984 .

[68]  S V Tiroumourougane,et al.  Japanese viral encephalitis , 2002, Postgraduate medical journal.

[69]  A S Kolaskar,et al.  Prediction of three-dimensional structure and mapping of conformational epitopes of envelope glycoprotein of Japanese encephalitis virus. , 1999, Virology.

[70]  L. DeLucas,et al.  Crystallization, characterization and measurement of MAD data on crystals of dengue virus NS3 serine protease complexed with mung-bean Bowman-Birk inhibitor. , 1999, Acta crystallographica. Section D, Biological crystallography.

[71]  V Satchidanandam,et al.  Expression of the Japanese encephalitis virus NS3 and NS2b proteins as glutathione S-transferase fusions. , 1995, Indian journal of biochemistry & biophysics.

[72]  Jonathan A Eisen,et al.  New evolutionary frontiers from unusual virus genomes , 2005, Genome Biology.

[73]  S C Wu,et al.  Complete nucleotide sequence and cell-line multiplication pattern of the attenuated variant CH2195LA of Japanese encephalitis virus. , 2001, Virus research.

[74]  K E Ebner,et al.  Cotranslational Membrane Insertion of the Serine Proteinase Precursor NS2B-NS3(Pro) of Dengue Virus Type 2 Is Required for Efficient in Vitro Processing and Is Mediated through the Hydrophobic Regions of NS2B* , 1997, The Journal of Biological Chemistry.

[75]  J. Whisstock,et al.  Mutagenesis of the dengue virus type 2 NS3 proteinase and the production of growth-restricted virus. , 2001, The Journal of general virology.

[76]  G. Duverlie,et al.  Determining hepatitis C genotype by analyzing the sequence of the NS5b region. , 2003, Journal of virological methods.

[77]  D. Womble,et al.  GCG: The Wisconsin Package of sequence analysis programs. , 2000, Methods in molecular biology.

[78]  Chris Upton,et al.  Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource , 2004, Nucleic Acids Res..

[79]  J. Maniloff,et al.  Virus taxonomy : eighth report of the International Committee on Taxonomy of Viruses , 2005 .

[80]  Chris Upton,et al.  Base-By-Base: Single nucleotide-level analysis of whole viral genome alignments , 2004, BMC Bioinformatics.

[81]  E. Gould,et al.  Evolution and dispersal of encephalitic flaviviruses. , 2004, Archives of virology. Supplementum.