HIV-1 protease cleavage site prediction based on two-stage feature selection method.

Knowledge of the mechanism of HIV protease cleavage specificity is critical to the design of specific and effective HIV inhibitors. Searching for an accurate, robust, and rapid method to correctly predict the cleavage sites in proteins is crucial when searching for possible HIV inhibitors. In this article, HIV-1 protease specificity was studied using the correlation-based feature subset (CfsSubset) selection method combined with Genetic Algorithms method. Thirty important biochemical features were found based on a jackknife test from the original data set containing 4,248 features. By using the AdaBoost method with the thirty selected features the prediction model yields an accuracy of 96.7% for the jackknife test and 92.1% for an independent set test, with increased accuracy over the original dataset by 6.7% and 77.4%, respectively. Our feature selection scheme could be a useful technique for finding effective competitive inhibitors of HIV protease.

[1]  J. Chou,et al.  Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. , 1993, Biochemistry.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3]  Shao-Ping Shi,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[4]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[5]  K. Chou,et al.  Analysis and Prediction of the Metabolic Stability of Proteins Based on Their Sequential Features, Subcellular Locations and Interaction Networks , 2010, PloS one.

[6]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[7]  M. Markowitz,et al.  Resistance to Human Immunodeficiency Virus Type 1 Protease Inhibitors , 1998, Antimicrobial Agents and Chemotherapy.

[8]  L. Resnick,et al.  The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase. , 1993, The Journal of biological chemistry.

[9]  Kuo-Chen Chou,et al.  Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties , 2011, PloS one.

[10]  Lin Lu,et al.  Prediction of interaction between small molecule and enzyme using AdaBoost , 2009, Molecular Diversity.

[11]  Kuo-Chen Chou,et al.  Predicting the network of substrate-enzyme-product triads by combining compound similarity and functional domain composition , 2010, BMC Bioinformatics.

[12]  Thorsteinn S. Rögnvaldsson,et al.  Why neural networks should not be used for HIV-1 protease cleavage site prediction , 2004, Bioinform..

[13]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[14]  Kuo-Chen Chou,et al.  Analysis of Protein Pathway Networks Using Hybrid Properties , 2010, Molecules.

[15]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[16]  Q Gu,et al.  Prediction of G-protein-coupled receptor classes in low homology using Chou's pseudo amino acid composition with approximate entropy and hydrophobicity patterns. , 2010, Protein and peptide letters.

[17]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  J. Chou,et al.  Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E. , 1993, The Journal of biological chemistry.

[19]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[20]  K. Chou,et al.  Cell-PLoc 2.0: an improved package of web-servers for predicting subcellular localization of proteins in various organisms , 2010 .

[21]  Hao Lin,et al.  Prediction of cell wall lytic enzymes using Chou's amphiphilic pseudo amino acid composition. , 2009, Protein and peptide letters.

[22]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[23]  A Wlodawer,et al.  Structure of complex of synthetic HIV-1 protease with a substrate-based inhibitor at 2.3 A resolution. , 1989, Science.

[24]  Zheng Rong Yang,et al.  Mining HIV protease cleavage data using genetic programming with a sum-product function , 2004, Bioinform..

[25]  J. Chou,et al.  Kinetic studies with the non-nucleoside human immunodeficiency virus type-1 reverse transcriptase inhibitor U-90152E. , 1994, Biochemical pharmacology.

[26]  L. Resnick,et al.  The benzylthio-pyrimidine U-31,355, a potent inhibitor of HIV-1 reverse transcriptase. , 1996, Biochemical pharmacology.

[27]  Guo-Zheng Li,et al.  Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins , 2008, Molecular Diversity.

[28]  Loris Nanni,et al.  A further step toward an optimal ensemble of classifiers for peptide classification, a case study: HIV protease. , 2009, Protein and peptide letters.

[29]  M. Esmaeili,et al.  Using the concept of Chou's pseudo amino acid composition for risk type prediction of human papillomaviruses. , 2010, Journal of theoretical biology.

[30]  Hassan Mohabatkar,et al.  Prediction of cyclin proteins using Chou's pseudo amino acid composition. , 2010, Protein and peptide letters.

[31]  Kuo-Chen Chou,et al.  HIVcleave: a web-server for predicting human immunodeficiency virus protease cleavage sites in proteins. , 2008, Analytical biochemistry.

[32]  K. Chou,et al.  Predicting human immunodeficiency virus protease cleavage sites in proteins by a discriminant function method , 1996, Proteins.

[33]  Rajani R Joshi,et al.  Characteristic peptides of protein secondary structural motifs. , 2010, Protein and peptide letters.

[34]  Xiaoqi Zheng,et al.  Predicting protein subcellular localization by pseudo amino acid composition with a segment-weighted and features-combined approach. , 2011, Protein and peptide letters.

[35]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[36]  K C Chou,et al.  Kinetics of processive nucleic acid polymerases and nucleases. , 1994, Analytical biochemistry.

[37]  K. Chou,et al.  Recent progress in protein subcellular location prediction. , 2007, Analytical biochemistry.

[38]  Enrique Romero,et al.  Margin maximization with feed-forward neural networks: a comparative study with SVM and AdaBoost , 2004, Neurocomputing.

[39]  Charles E. Taylor Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence. Complex Adaptive Systems.John H. Holland , 1994 .

[40]  Lin Lu,et al.  HIV‐1 protease cleavage site prediction based on amino acid property , 2009, J. Comput. Chem..

[41]  Xiaoyong Zou,et al.  Prediction of protein secondary structure content by using the concept of Chou's pseudo amino acid composition and support vector machine. , 2009, Protein and peptide letters.

[42]  Thomas Martinetz,et al.  Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition. , 2010, Protein and peptide letters.

[43]  Kuo-Chen Chou,et al.  Predicting protein structural class with AdaBoost Learner. , 2006, Protein and peptide letters.

[44]  Wencong Lu,et al.  Predicting toxic action mechanisms of phenols using AdaBoost Learner , 2009 .

[45]  K C Chou,et al.  Artificial neural network model for predicting HIV protease cleavage sites in protein , 1998 .

[46]  J. Chou,et al.  Predicting cleavability of peptide sequences by HIV protease via correlation-angle approach , 1993, Journal of protein chemistry.

[47]  K. Chou,et al.  A vectorized sequence-coupling model for predicting HIV protease cleavage sites in proteins. , 1993, The Journal of biological chemistry.

[48]  David P. Helmbold,et al.  A geometric approach to leveraging weak learners , 1999, Theor. Comput. Sci..

[49]  Ganapati Panda,et al.  A novel feature representation method based on Chou's pseudo amino acid composition for protein structural class prediction , 2010, Comput. Biol. Chem..

[50]  Jianding Qiu,et al.  Using the concept of Chou's pseudo amino acid composition to predict enzyme family classes: an approach with support vector machine based on discrete wavelet transform. , 2010, Protein and peptide letters.

[51]  Richard E. Korf,et al.  Best-First Minimax Search , 1996, Artif. Intell..

[52]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[53]  A Wlodawer,et al.  Structure at 2.5-A resolution of chemically synthesized human immunodeficiency virus type 1 protease complexed with a hydroxyethylene-based inhibitor. , 1991, Biochemistry.

[54]  Lele Hu,et al.  Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features. , 2011, Protein and peptide letters.

[55]  A Wlodawer,et al.  Human Immunodeficiency Virus, Type 1 Protease Substrate Specificity Is Limited by Interactions between Substrate Amino Acids Bound in Adjacent Enzyme Subsites (*) , 1996, The Journal of Biological Chemistry.

[56]  A. Esmaeili,et al.  Prediction of GABAA receptor proteins using the concept of Chou's pseudo-amino acid composition and support vector machine. , 2011, Journal of theoretical biology.

[57]  K. Chou,et al.  Studies on the specificity of HIV protease: An application of Markov chain theory , 1993, Journal of protein chemistry.

[58]  A. Tomasselli,et al.  A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base. , 1991, The Journal of biological chemistry.

[59]  Ajit Narayanan,et al.  Mining viral protease data to extract cleavage knowledge , 2002, ISMB.

[60]  Kuo-Chen Chou,et al.  Inhibitor design for SARS coronavirus main protease based on "distorted key theory". , 2007, Medicinal chemistry (Shariqah (United Arab Emirates)).

[61]  K. Chou,et al.  iLoc-Virus: a multi-label learning classifier for identifying the subcellular localization of virus proteins with both single and multiple sites. , 2011, Journal of theoretical biology.

[62]  Dongqing Wei,et al.  Synthesis and activity of an octapeptide inhibitor designed for SARS coronavirus main proteinase , 2005, Peptides.

[63]  K. Chou,et al.  Prediction of Antimicrobial Peptides Based on Sequence Alignment and Feature Selection Methods , 2011, PloS one.

[64]  Hui Ding,et al.  Identify Golgi protein types with modified Mahalanobis discriminant algorithm and pseudo amino acid composition. , 2011, Protein and peptide letters.

[65]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[66]  Menglong Li,et al.  SecretP: identifying bacterial secreted proteins by fusing new features into Chou's pseudo-amino acid composition. , 2010, Journal of theoretical biology.

[67]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[68]  A Wlodawer,et al.  Structural and biochemical studies of retroviral proteases. , 2000, Biochimica et biophysica acta.

[69]  K. Chou,et al.  iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins , 2011, PloS one.

[70]  Yanzhi Guo,et al.  Using the augmented Chou's pseudo amino acid composition for predicting protein submitochondria locations based on auto covariance approach. , 2009, Journal of theoretical biology.

[71]  K. Chou,et al.  Neural network prediction of the HIV-1 protease cleavage sites. , 1995, Journal of theoretical biology.

[72]  K. Chou,et al.  Cell-PLoc: a package of Web servers for predicting subcellular localization of proteins in various organisms , 2008, Nature Protocols.