Predicting human immunodeficiency virus protease cleavage sites in nonlinear projection space

HIV-1 protease has a broad and complex substrate specificity. The discovery of an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease would greatly expedite the search for inhibitors of HIV protease. During the last two decades, various methods have been developed to explore the specificity of HIV protease cleavage activity. However, because little advancement has been made in the understanding of HIV-1 protease cleavage site specificity, not much progress has been reported in either extracting effective methods or maintaining high prediction accuracy. In this article, a theoretical framework is developed, based on the kernel method for dimensionality reduction and prediction for HIV-1 protease cleavage site specificity. A nonlinear dimensionality reduction kernel method, based on manifold learning, is proposed to reduce the high dimensions of protease specificity. A support vector machine is applied to predict the protease cleavage. Superior performance in comparison to that previously published in literature is obtained using numerical simulations showing that the basic specificities of the HIV-1 protease are maintained in reduction feature space, and by combining the nonlinear dimensionality reduction algorithm with a support vector machine classifier.

[1]  William R. Taylor,et al.  A structural model for the retroviral proteases , 1987, Nature.

[2]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[3]  Chi-Huey Wong,et al.  HIV‐1 Protease: Mechanism and Drug Discovery , 2003 .

[4]  A. Tomasselli,et al.  A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base. , 1991, The Journal of biological chemistry.

[5]  L. Saul,et al.  Think globally, fit locally: unsupervised l earning of non-linear manifolds , 2002 .

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  K C Chou,et al.  Artificial neural network model for predicting HIV protease cleavage sites in protein , 1998 .

[8]  Sami Mahrus,et al.  Altered Substrate Specificity of Drug-Resistant Human Immunodeficiency Virus Type 1 Protease , 2002, Journal of Virology.

[9]  Thorsteinn S. Rögnvaldsson,et al.  Why neural networks should not be used for HIV-1 protease cleavage site prediction , 2004, Bioinform..

[10]  Il-Jin Kim,et al.  DNA microarray analysis of the correlation between gene expression patterns and acquired resistance to 5-FU/cisplatin in gastric cancer. , 2004, Biochemical and biophysical research communications.

[11]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[12]  K. Chou,et al.  Neural network prediction of the HIV-1 protease cleavage sites. , 1995, Journal of theoretical biology.

[13]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[14]  P. Kollman,et al.  A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules , 1995 .

[15]  Zachary Q. Beck,et al.  Identification of efficiently cleaved substrates for HIV-1 protease using a phage display library and use in inhibitor development. , 2000, Virology.

[16]  Thorsteinn S. Rögnvaldsson,et al.  Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease , 2005, Journal of Virology.

[17]  R Begg,et al.  A machine learning approach for automated recognition of movement patterns using basic, kinetic and kinematic gait data. , 2005, Journal of biomechanics.

[18]  Yu-Dong Cai,et al.  A novel computational method to predict transcription factor DNA binding preference. , 2006, Biochemical and biophysical research communications.

[19]  Lukasz Kurgan,et al.  Prediction of protein structural class for the twilight zone sequences. , 2007, Biochemical and biophysical research communications.

[20]  Ajit Narayanan,et al.  Mining viral protease data to extract cleavage knowledge , 2002, ISMB.

[21]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[22]  A Wlodawer,et al.  Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. , 1989, Science.