Feature Selection Combined with Neural Network Structure Optimization for HIV-1 Protease Cleavage Site Prediction

It is crucial to understand the specificity of HIV-1 protease for designing HIV-1 protease inhibitors. In this paper, a new feature selection method combined with neural network structure optimization is proposed to analyze the specificity of HIV-1 protease and find the important positions in an octapeptide that determined its cleavability. Two kinds of newly proposed features based on Amino Acid Index database plus traditional orthogonal encoding features are used in this paper, taking both physiochemical and sequence information into consideration. Results of feature selection prove that p2, p1, p1′, and p2′ are the most important positions. Two feature fusion methods are used in this paper: combination fusion and decision fusion aiming to get comprehensive feature representation and improve prediction performance. Decision fusion of subsets that getting after feature selection obtains excellent prediction performance, which proves feature selection combined with decision fusion is an effective and useful method for the task of HIV-1 protease cleavage site prediction. The results and analysis in this paper can provide useful instruction and help designing HIV-1 protease inhibitor in the future.

[1]  Kazuyuki Murase,et al.  A new wrapper feature selection approach using neural network , 2010, Neurocomputing.

[2]  Loris Nanni,et al.  Machine learning for HIV-1 protease cleavage site prediction , 2006, Pattern Recognit. Lett..

[3]  Lin Lu,et al.  HIV‐1 protease cleavage site prediction based on amino acid property , 2009, J. Comput. Chem..

[4]  Thorsteinn S. Rögnvaldsson,et al.  Why neural networks should not be used for HIV-1 protease cleavage site prediction , 2004, Bioinform..

[5]  Yiying Zhang,et al.  Predictability of Rules in HIV-1 Protease Cleavage Site Analysis , 2006, International Conference on Computational Science.

[6]  K C Chou,et al.  Artificial neural network model for predicting HIV protease cleavage sites in protein , 1998 .

[7]  D. Powers Evaluation: From Precision, Recall and F-Factor to ROC, Informedness, Markedness & Correlation , 2008 .

[8]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[9]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[10]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  K. Chou Prediction of human immunodeficiency virus protease cleavage sites in proteins. , 1996, Analytical biochemistry.

[12]  Loris Nanni,et al.  A new encoding technique for peptide classification , 2011, Expert Syst. Appl..

[13]  Thorsteinn Rögnvaldsson,et al.  Bioinformatic approaches for modeling the substrate specificity of HIV-1 protease: an overview , 2007, Expert review of molecular diagnostics.

[14]  Thorsteinn S. Rögnvaldsson,et al.  Comprehensive Bioinformatic Analysis of the Specificity of Human Immunodeficiency Virus Type 1 Protease , 2005, Journal of Virology.

[15]  Eduardo Gasca,et al.  Eliminating redundancy and irrelevance using a new MLP-based feature selection method , 2006, Pattern Recognit..

[16]  Loris Nanni,et al.  Comparison among feature extraction methods for HIV-1 protease cleavage site prediction , 2006, Pattern Recognit..

[17]  Kazuyuki Murase,et al.  A new algorithm to design compact two-hidden-layer artificial neural networks , 2001, Neural Networks.

[18]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[19]  Jan Komorowski,et al.  Computational proteomics analysis of HIV‐1 protease interactome , 2007, Proteins.

[20]  Chi‐Huey Wong,et al.  HIV-1 protease: mechanism and drug discovery. , 2003, Organic & biomolecular chemistry.

[21]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..