In Silico Pattern-Based Analysis of the Human Cytomegalovirus Genome

ABSTRACT More than 200 open reading frames (ORFs) from the human cytomegalovirus genome have been reported as potentially coding for proteins. We have used two pattern-based in silico approaches to analyze this set of putative viral genes. With the help of an objective annotation method that is based on the Bio-Dictionary, a comprehensive collection of amino acid patterns that describes the currently known natural sequence space of proteins, we have reannotated all of the previously reported putative genes of the human cytomegalovirus. Also, with the help of MUSCA, a pattern-based multiple sequence alignment algorithm, we have reexamined the original human cytomegalovirus gene family definitions. Our analysis of the genome shows that many of the coded proteins comprise amino acid combinations that are unique to either the human cytomegalovirus or the larger group of herpesviruses. We have confirmed that a surprisingly large portion of the analyzed ORFs encode membrane proteins, and we have discovered a significant number of previously uncharacterized proteins that are predicted to be G-protein-coupled receptor homologues. The analysis also indicates that many of the encoded proteins undergo posttranslational modifications such as hydroxylation, phosphorylation, and glycosylation. ORFs encoding proteins with similar functional behavior appear in neighboring regions of the human cytomegalovirus genome. All of the results of the present study can be found and interactively explored online (http://cbcsrv.watson.ibm.com/virus/ ).

[1]  H. Doerr,et al.  Decreased Neutrophil Adhesion to Human Cytomegalovirus-Infected Retinal Pigment Epithelial Cells Is Mediated by Virus-Induced Up-Regulation of Fas Ligand Independent of Neutrophil Apoptosis1 , 2000, The Journal of Immunology.

[2]  K. Kivirikko,et al.  Collagen hydroxylases and the protein disulfide isomerase subunit of prolyl 4-hydroxylases. , 1998, Advances in enzymology and related areas of molecular biology.

[3]  D. T. Jones,et al.  A new approach to protein fold recognition , 1992, Nature.

[4]  Aris Floratos,et al.  Sequence homology detection through large scale pattern discovery , 1999, RECOMB.

[5]  T. Shenk,et al.  Characterization of the human cytomegalovirus irs1 and trs1 genes: a second immediate-early transcription unit within irs1 whose product antagonizes transcriptional activation , 1997, Journal of virology.

[6]  G. Wilkinson,et al.  Nucleotide sequence of the most abundantly transcribed early gene of human cytomegalovirus strain AD169. , 1987, Virus research.

[7]  J. Fickett Recognition of protein coding regions in DNA sequences. , 1982, Nucleic acids research.

[8]  G. Bergamini,et al.  The major open reading frame of the beta2.7 transcript of human cytomegalovirus: in vitro expression of a protein posttranscriptionally regulated by the 5' region. , 1998, Journal of virology.

[9]  I. Rigoutsos,et al.  The emergence of pattern discovery techniques in computational biology. , 2000, Metabolic engineering.

[10]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[11]  T. Schwartz,et al.  Virally encoded 7TM receptors , 2001, Oncogene.

[12]  J. Smith,et al.  Human cytomegalovirus UL102 gene , 1995, Journal of virology.

[13]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[14]  S. McKnight,et al.  Oxygen Sensing Gets a Second Wind , 2002, Science.

[15]  Gert Vriend,et al.  GPCRDB information system for G protein-coupled receptors , 2003, Nucleic Acids Res..

[16]  Aris Floratos,et al.  An Approximation Algorithm for Alignment of Multiple Sequences using Motif Discovery , 1999, J. Comb. Optim..

[17]  J. Trapani,et al.  The Relative Role of Lymphocyte Granule Exocytosis versus Death Receptor-Mediated Cytotoxicity in Viral Pathophysiology , 1998, Journal of Virology.

[18]  I. Rigoutsos,et al.  Dictionary-driven prokaryotic gene finding. , 2002, Nucleic acids research.

[19]  Aris Floratos,et al.  Building Dictionaries of 1D and 3D Motifs by Mining the Unaligned 1D Sequences of 17 Archaeal and Bacterial Genomes , 1999, ISMB.

[20]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[21]  M. Sippl,et al.  Detection of native‐like models for amino acid sequences of unknown three‐dimensional structure in a data base of known protein conformations , 1992, Proteins.

[22]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[23]  D. McGeoch,et al.  The published DNA sequence of human cytomegalovirus strain AD169 lacks 929 base pairs affecting genes UL42 and UL43 , 1997, Journal of virology.

[24]  B. Barrell,et al.  Analysis of the protein-coding content of the sequence of human cytomegalovirus strain AD169. , 1990, Current topics in microbiology and immunology.

[25]  J. Brown,et al.  Reassessing the organization of the UL42-UL43 region of the human cytomegalovirus strain AD169 genome. , 1997, Virology.

[26]  C Ouzounis,et al.  Dictionary building via unsupervised hierarchical motif discovery in the sequence space of natural proteins , 1999, Proteins.

[27]  I. Rigoutsos,et al.  Dictionary-driven protein annotation. , 2002, Nucleic acids research.

[28]  Aris Floratos,et al.  Motif discovery without alignment or enumeration (extended abstract) , 1998, RECOMB '98.

[29]  E Tom,et al.  Human cytomegalovirus clinical isolates carry at least 19 genes not found in laboratory strains , 1996, Journal of virology.

[30]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[31]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[32]  E. Mocarski,et al.  Transactivation of the cytomegalovirus ICP36 gene promoter requires the alpha gene product TRS1 in addition to IE1 and IE2 , 1992, Journal of virology.

[33]  J Novotny,et al.  In silico structural and functional analysis of the human cytomegalovirus (HHV5) genome. , 2001, Journal of molecular biology.