An integrated scheme for feature selection and parameter setting in the support vector machine modeling and its application to the prediction of pharmacokinetic properties of drugs

OBJECTIVE Support vector machine (SVM), a statistical learning method, has recently been evaluated in the prediction of absorption, distribution, metabolism, and excretion properties, as well as toxicity (ADMET) of new drugs. However, two problems still remain in SVM modeling, namely feature selection and parameter setting. The two problems have been shown to have an important impact on the efficiency and accuracy of SVM classification. In particular, the feature subset choice and optimal SVM parameter settings influence each other; this suggested that they should be dealt with simultaneously. In this paper, we propose an integrated scheme to account for both feature subset choice and SVM parameter settings in concert. METHOD In the proposed scheme, a genetic algorithm (GA) is used for the feature selection and the conjugate gradient (CG) method for the parameter optimization. Several classification models of ADMET related properties have been built for assessing and testing the integrated GA-CG-SVM scheme. They include: (1) identification of P-glycoprotein substrates and nonsubstrates, (2) prediction of human intestinal absorption, (3) prediction of compounds inducing torsades de pointes, and (4) prediction of blood-brain barrier penetration. RESULTS Compared with the results of previous SVM studies, our GA-CG-SVM approach significantly improves the overall prediction accuracy and has fewer input features. CONCLUSIONS Our results indicate that considering feature selection and parameter optimization simultaneously, in SVM modeling, can help to develop better predictive models for the ADMET properties of drugs.

[1]  A. Seelig A general pattern for substrate recognition by P-glycoprotein. , 1998, European journal of biochemistry.

[2]  C. B. Lucasius,et al.  Understanding and using genetic algorithms Part 1. Concepts, properties and context , 1993 .

[3]  B Testa,et al.  Predicting blood-brain barrier permeation from three-dimensional molecular structure. , 2000, Journal of medicinal chemistry.

[4]  Bernard F. Buxton,et al.  Support Vector Machines in Combinatorial Chemistry , 2001 .

[5]  Igor V. Tetko,et al.  Virtual Computational Chemistry Laboratory – Design and Description , 2005, J. Comput. Aided Mol. Des..

[6]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[7]  Juan M. Luco,et al.  QSAR Studies on Blood-Brain Barrier Permeation , 2006 .

[8]  John H. Kalivas,et al.  Comparison of Forward Selection, Backward Elimination, and Generalized Simulated Annealing for Variable Selection , 1993 .

[9]  J. Stoer,et al.  Introduction to Numerical Analysis , 2002 .

[10]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[11]  S. Sathiya Keerthi,et al.  An Efficient Method for Gradient-Based Adaptation of Hyperparameters in SVM Models , 2006, NIPS.

[12]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[13]  Thomas Hofmann,et al.  Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms , 2002, J. Comput. Biol..

[14]  J. Miners,et al.  Towards integrated ADME prediction: past, present and future directions for modelling metabolism by UDP-glucuronosyltransferases. , 2004, Journal of molecular graphics & modelling.

[15]  Zhi-Wei Cao,et al.  Effect of Selection of Molecular Descriptors on the Prediction of Blood-Brain Barrier Penetrating and Nonpenetrating Agents by Statistical Learning Methods , 2005, J. Chem. Inf. Model..

[16]  Anne Hersey,et al.  On the mechanism of human intestinal absorption. , 2002, European journal of medicinal chemistry.

[17]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[18]  W. A. Newman Dorland,et al.  Dorland's Illustrated Medical Dictionary , 1974 .

[19]  Ying Xue,et al.  Statistical learning approach for predicting specific pharmacodynamic, pharmacokinetic, or toxicological properties of pharmaceutical agents , 2005 .

[20]  U. Norinder,et al.  Computational approaches to the prediction of the blood-brain distribution. , 2002, Advanced drug delivery reviews.

[21]  Xin Chen,et al.  Effect of Molecular Descriptor Feature Selection in Support Vector Machine Classification of Pharmacokinetic and Toxicological Properties of Chemical Agents , 2004, J. Chem. Inf. Model..

[22]  Prabha Garg,et al.  In Silico Prediction of Blood Brain Barrier Permeability: An Artificial Neural Network Model , 2006, J. Chem. Inf. Model..

[23]  Maurizio Recanatini,et al.  Safety of Non-Antiarrhythmic Drugs that Prolong the QT Interval or Induce Torsade de Pointes , 2002, Drug safety.

[24]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[25]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[26]  Andrew M Davis,et al.  Predictive ADMET studies, the challenges and the opportunities. , 2004, Current opinion in chemical biology.

[27]  Andreas Zell,et al.  Kernel Functions for Attributed Molecular Graphs – A New Similarity‐Based Approach to ADME Prediction in Classification and Regression , 2006 .

[28]  Bernhard Schölkopf,et al.  Feature selection for support vector machines by means of genetic algorithm , 2003, Proceedings. 15th IEEE International Conference on Tools with Artificial Intelligence.

[29]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[30]  Erik Evensen,et al.  A computational ensemble pharmacophore model for identifying substrates of P-glycoprotein. , 2002, Journal of medicinal chemistry.

[31]  Mark L. Lewis,et al.  Predicting Penetration Across the Blood-Brain Barrier from Simple Descriptors and Fragmentation Schemes , 2007, J. Chem. Inf. Model..

[32]  Andreas Klamt,et al.  Prediction of Blood-Βrain Partitioning and Human Serum Albumin Binding Based on COSMO-RS σ-Moments , 2007, J. Chem. Inf. Model..

[33]  H. van de Waterbeemd,et al.  ADMET in silico modelling: towards prediction paradise? , 2003, Nature reviews. Drug discovery.

[34]  A. J. Hopfinger,et al.  Predicting Blood–Brain Barrier Partitioning of Organic Molecules Using Membrane–Interaction QSAR Analysis , 2002, Pharmaceutical Research.

[35]  Sean B. Holden,et al.  Support Vector Machines for ADME Property Classification , 2003 .

[36]  Gilles Klopman,et al.  ADME evaluation. 2. A computer model for the prediction of intestinal absorption in humans. , 2002, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[37]  Nostrand Reinhold,et al.  the utility of using the genetic algorithm approach on the problem of Davis, L. (1991), Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York. , 1991 .

[38]  M. Hestenes,et al.  Methods of conjugate gradients for solving linear systems , 1952 .

[39]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.