Prediction of the Isoelectric Point of an Amino Acid Based on GA-PLS and SVMs

The support vector machine (SVM), as a novel type of a learning machine, for the first time, was used to develop a QSPR model that relates the structures of 35 amino acids to their isoelectric point. Molecular descriptors calculated from the structure alone were used to represent molecular structures. The seven descriptors selected using GA-PLS, which is a sophisticated hybrid approach that combines GA as a powerful optimization method with PLS as a robust statistical method for variable selection, were used as inputs of RBFNNs and SVM to predict the isoelectric point of an amino acid. The optimal QSPR model developed was based on support vector machines, which showed the following results: the root-mean-square error of 0.2383 and the prediction correlation coefficient R=0.9702 were obtained for the whole data set. Satisfactory results indicated that the GA-PLS approach is a very effective method for variable selection, and the support vector machine is a very promising tool for the nonlinear approximation.

[1]  Alejandro C. Olivieri,et al.  Wavelength Selection for Multivariate Calibration Using a Genetic Algorithm: A Novel Initialization Strategy , 2002, J. Chem. Inf. Comput. Sci..

[2]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[3]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[4]  Tingjun Hou,et al.  Applications of Genetic Algorithms on the Structure-Activity Relationship Analysis of Some Cinnamamides , 1999, J. Chem. Inf. Comput. Sci..

[5]  R. Czerminski,et al.  Use of Support Vector Machine in Pattern Classification: Application to QSAR Studies , 2001 .

[6]  Kuo-Chen Chou,et al.  Prediction of Protein Structural Classes by Support Vector Machines , 2002, Comput. Chem..

[7]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[8]  A. Belousov,et al.  A flexible classification approach with optimal generalisation performance: support vector machines , 2002 .

[9]  Feng Luan,et al.  Diagnosing Breast Cancer Based on Support Vector Machines , 2003, J. Chem. Inf. Comput. Sci..

[10]  Gunnar Rätsch,et al.  Active Learning with Support Vector Machines in the Drug Discovery Process , 2003, J. Chem. Inf. Comput. Sci..

[11]  V. Kulkarni,et al.  Three-dimensional quantitative structure-activity relationship (3D-QSAR) of 3-aryloxazolidin-2-one antibacterials. , 2001, Bioorganic & medicinal chemistry.

[12]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Application of GA-Based Region Selection to a 3D-QSAR Study of Acetylcholinesterase Inhibitors , 1999, J. Chem. Inf. Comput. Sci..

[13]  D. Manallack,et al.  Neural networks in drug discovery: Have they lived up to their promise? , 1999 .

[14]  Thomas R. Rybolt,et al.  Molar Refractivity and Connectivity Index Correlations for Henry's Law Virial Coefficients of Odorous Sulfur Compounds on Carbon and for Gas-Chromatographic Retention Indices. , 2001, Journal of colloid and interface science.

[15]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Application of GA‐Based Region Selection to a 3D‐QSAR Study of Acetylcholinesterase Inhibitors. , 1999 .

[16]  Lynne Boddy,et al.  Support vector machines for identifying organisms: a comparison with strongly partitioned radial basis function networks , 2001 .

[17]  Wenjian Wang,et al.  Determination of the spread parameter in the Gaussian kernel for classification and regression , 2003, Neurocomputing.

[18]  Jinbo Bi,et al.  Prediction of Protein Retention Times in Anion-Exchange Chromatography Systems Using Support Vector Regression , 2002, J. Chem. Inf. Comput. Sci..

[19]  Francis Eng Hock Tay,et al.  Modified support vector machines in financial time series forecasting , 2002, Neurocomputing.

[20]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: Enhancement of Comparative Molecular Binding Energy Analysis by GA‐Based PLS Method , 1999 .

[21]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[22]  Xiaoyun Zhang,et al.  Radial basis function network-based quantitative structure–property relationship for the prediction of Henry’s law constant , 2002 .

[23]  Zhang Li SUPPORT VECTOR MACHINE FOR 1-D IMAGE RECOGNITION , 2002 .

[24]  Ruisheng Zhang,et al.  Radial basis function neural network-based QSPR for the prediction of critical temperature , 2002 .

[25]  David Hartsough,et al.  Toward an Optimal Procedure for Variable Selection and QSAR Model Building , 2001, J. Chem. Inf. Comput. Sci..

[26]  Ruisheng Zhang,et al.  QSAR Study of Ethyl 2-[(3-Methyl-2, 5-dioxo(3-pyrrolinyl))amino]-4-(trifluoromethyl) pyrimidine-5-carboxylate: An Inhibitor of AP-1 and NF-B Mediated Gene Expression Based on Support Vector Machines , 2003, J. Chem. Inf. Comput. Sci..

[27]  Lijuan Cao,et al.  Support vector machines experts for time series forecasting , 2003, Neurocomputing.

[28]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..