Mining viral protease data to extract cleavage knowledge

MOTIVATION The motivation is to identify, through machine learning techniques, specific patterns in HIV and HCV viral polyprotein amino acid residues where viral protease cleaves the polyprotein as it leaves the ribosome. An understanding of viral protease specificity may help the development of future anti-viral drugs involving protease inhibitors by identifying specific features of protease activity for further experimental investigation. While viral sequence information is growing at a fast rate, there is still comparatively little understanding of how viral polyproteins are cut into their functional unit lengths. The aim of the work reported here is to investigate whether it is possible to generalise from known cleavage sites to unknown cleavage sites for two specific viruses-HIV and HCV. An understanding of proteolytic activity for specific viruses will contribute to our understanding of viral protease function in general, thereby leading to a greater understanding of protease families and their substrate characteristics. RESULTS Our results show that artificial neural networks and symbolic learning techniques (See5) capture some fundamental and new substrate attributes, but neural networks outperform their symbolic counterpart.

[1]  Matthew J. Gonzales,et al.  Human Immunodeficiency Virus Reverse Transcriptase and Protease Sequence Database: an expanded data model integrating natural language text and sequence analysis programs , 2001, Nucleic Acids Res..

[2]  T. Miyata,et al.  Retroviral gag and DNA endonuclease coding sequences in IgE-binding factor gene , 1985, Nature.

[3]  A Wlodawer,et al.  Human Immunodeficiency Virus, Type 1 Protease Substrate Specificity Is Limited by Interactions between Substrate Amino Acids Bound in Adjacent Enzyme Subsites (*) , 1996, The Journal of Biological Chemistry.

[4]  B. Haynes,et al.  Frequent detection and isolation of cytopathic retroviruses (HTLV-III) from patients with AIDS and at risk for AIDS. , 1984, Science.

[5]  M. Houghton,et al.  Isolation of a cDNA clone derived from a blood-borne non-A, non-B viral hepatitis genome. , 1989, Science.

[6]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[7]  Claude M. Fauquet,et al.  The classification and nomenclature of viruses , 1976, Archives of Virology.

[8]  R. Bartenschlager,et al.  Substrate determinants for cleavage in cis and in trans by the hepatitis C virus NS3 proteinase , 1995, Journal of virology.

[9]  A. Berger,et al.  Mapping the active site of papain with the aid of peptide substrates and inhibitors. , 1970, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[10]  R. Bartenschlager,et al.  Nonstructural protein 3 of the hepatitis C virus encodes a serine-type proteinase required for cleavage at the NS3/4 and NS4/5 junctions , 1993, Journal of virology.

[11]  Laurene V. Fausett,et al.  Fundamentals Of Neural Networks , 1994 .

[12]  Edoardo Cervoni,et al.  Hepatitis C , 1998, The Lancet.

[13]  C. Hutchison,et al.  Analysis of retroviral protease cleavage sites reveals two types of cleavage sites and the structural requirements of the P1 amino acid. , 1991, The Journal of biological chemistry.

[14]  M. V. Regenmortel,et al.  Virus taxonomy: classification and nomenclature of viruses. Seventh report of the International Committee on Taxonomy of Viruses. , 2000 .

[15]  I T Weber,et al.  Molecular mechanics calculations on HIV-1 protease with peptide substrates correlate with experimental data. , 1996, Protein engineering.

[16]  A Tramontano,et al.  Substrate Specificity of the Hepatitis C Virus Serine Protease NS3* , 1997, The Journal of Biological Chemistry.

[17]  G. Viljoen,et al.  Identification of further proteolytic cleavage sites in the Southampton calicivirus polyprotein by expression of the viral protease in E. coli. , 1999, The Journal of general virology.

[18]  R J Fletterick,et al.  Evidence that the N-terminal domain of nonstructural protein NS3 from yellow fever virus is a serine protease responsible for site-specific cleavages in the viral polyprotein. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[19]  K C Chou,et al.  Artificial neural network model for predicting HIV protease cleavage sites in protein , 1998 .

[20]  M. Alter,et al.  Epidemiology of Hepatitis C , 2018, Clinical liver disease.

[21]  A. Davis,et al.  Substrate specificity of the NS3 serine proteinase of hepatitis C virus as determined by mutagenesis at the NS3/NS4A junction. , 1994, Virology.

[22]  A. Berger,et al.  On the size of the active site in proteases. I. Papain. , 1967, Biochemical and biophysical research communications.

[23]  A. Urbani,et al.  Activity of purified hepatitis C virus protease NS3 on peptide substrates , 1996, Journal of virology.

[24]  J Cohen,et al.  The Scientific Challenge of Hepatitis C , 1999, Science.

[25]  Jules L. Dienstag,et al.  An assay for circulating antibodies to a major etiologic virus of human non-A, non-B hepatitis , 1989 .

[26]  Neil D. Rawlings,et al.  [2] Families of serine peptidases , 1994, Methods in Enzymology.

[27]  C. Rice,et al.  Specificity of the hepatitis C virus NS3 serine protease: effects of substitutions at the 3/4A, 4A/4B, 4B/5A, and 5A/5B cleavage sites on polyprotein processing , 1994, Journal of virology.

[28]  W. Windsor,et al.  Probing the substrate specificity of hepatitis C virus NS3 serine protease by using synthetic peptides , 1997, Journal of virology.

[29]  N. Kato,et al.  Proteolytic processing and membrane association of putative nonstructural proteins of hepatitis C virus. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M. Houghton,et al.  The hepatitis C virus encodes a serine protease involved in processing of the putative nonstructural proteins from the viral polyprotein precursor. , 1993, Biochemical and biophysical research communications.

[31]  R. De Francesco,et al.  NS3 is a serine protease required for processing of hepatitis C virus polyprotein , 1993, Journal of virology.

[32]  M. Paetzel,et al.  Common protein architecture and binding sites in proteases utilizing a Ser/Lys dyad mechanism , 2008, Protein science : a publication of the Protein Society.

[33]  K. Shimotohno,et al.  Processing of hepatitis C viral polyprotein in Escherichia coli. , 1994, Gene.