SVM feature selection and sample regression for Chinese medicine research

In this paper, SVM based feature selection methods are introduced for regression problem of COX2 inhibitor activity prediction in Chinese medicine quantitative structure-activity relationship (QSAR) research. We develop a recursive SVM feature selection algorithm for regression and compare its performance with genetic algorithm and SVM recursive feature elimination (SVM-RFE) algorithm. Experiments on real Chinese medicine dataset show that our method is a fast and accurate algorithm for Chinese medicine regression problem with a small number of samples.

[1]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[2]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[3]  Michel Petitjean,et al.  Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds , 1992, J. Chem. Inf. Comput. Sci..

[4]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[8]  Andrew Y. Ng,et al.  On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples , 1998, ICML.

[9]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[10]  Gregory R. Grant,et al.  Bioinformatics - The Machine Learning Approach , 2000, Comput. Chem..

[11]  A. Gui Study on quantitative structure-activity relationships of COX/5-LO dual inhibitors , 2003 .

[12]  B. Fan,et al.  CoMFA/CoMSIA/HQSAR and Docking Study of the Binding Mode of Selective Cyclooxygenase (COX‐2) Inhibitors , 2004, QSAR & combinatorial science.

[13]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[14]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[15]  S. Prasanna,et al.  Quantitative structure–activity relationship studies of cyclooxygenase inhibitors: a comprehensive analysis , 2005 .

[16]  Ying Liu,et al.  Drug design by machine learning: ensemble learning for QSAR modeling , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[17]  Zehong Yang,et al.  Feature Selection in Predicting the Activity of Cyclooxygenase-2 Inhibitors , 2006, Artificial Intelligence and Applications.

[18]  S. Jachak Cyclooxygenase inhibitory natural products: current status. , 2006, Current medicinal chemistry.

[19]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[20]  Tomasz Arodz,et al.  Computational methods in developing quantitative structure-activity relationships (QSAR): a review. , 2006, Combinatorial chemistry & high throughput screening.

[21]  Zehong Yang,et al.  Feature Selection and Activity Prediction in Chinese Medicine Research Using a Hybrid Model GA-SVM , 2006, MLMTA.

[22]  Peng Wang,et al.  Machine learning in bioinformatics: A brief survey and recommendations for practitioners , 2006, Comput. Biol. Medicine.

[23]  Wang Jiaxin,et al.  Feature selection in predicting the activity of cyclooxygenase-2 inhibitors , 2006 .