Prediction of P-Glycoprotein Substrates by a Support Vector Machine Approach

P-glycoproteins (P-gp) actively transport a wide variety of chemicals out of cells and function as drug efflux pumps that mediate multidrug resistance and limit the efficacy of many drugs. Methods for facilitating early elimination of potential P-gp substrates are useful for facilitating new drug discovery. A computational ensemble pharmacophore model has recently been used for the prediction of P-gp substrates with a promising accuracy of 63%. It is desirable to extend the prediction range beyond compounds covered by the known pharmacophore models. For such a purpose, a machine learning method, support vector machine (SVM), was explored for the prediction of P-gp substrates. A set of 201 chemical compounds, including 116 substrates and 85 nonsubstrates of P-gp, was used to train and test a SVM classification system. This SVM system gave a prediction accuracy of at least 81.2% for P-gp substrates based on two different evaluation methods, which is substantially improved against that obtained from the multiple-pharmacophore model. The prediction accuracy for nonsubstrates of P-gp is 79.2% using 5-fold cross-validation. These accuracies are slightly better than those obtained from other statistical classification methods, including k-nearest neighbor (k-NN), probabilistic neural networks (PNN), and C4.5 decision tree, that use the same sets of data and molecular descriptors. Our study indicates the potential of SVM in facilitating the prediction of P-gp substrates.

[1]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[2]  Y Xue,et al.  Prediction of torsade-causing potential of drugs by support vector machine approach. , 2004, Toxicological sciences : an official journal of the Society of Toxicology.

[3]  H. Yu,et al.  Discovering compact and highly discriminative features or combinations of drug activities using support vector machines , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[4]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[5]  Chris H. Q. Ding,et al.  Multi-class protein fold recognition using support vector machines and neural networks , 2001, Bioinform..

[6]  I. Pastan,et al.  Biochemical, cellular, and pharmacological aspects of the multidrug transporter. , 1999, Annual review of pharmacology and toxicology.

[7]  Bernard De Baets,et al.  Feature subset selection for splice site prediction , 2002, ECCB.

[8]  L. Hall,et al.  Molecular Structure Description: The Electrotopological State , 1999 .

[9]  S. Unger Molecular Connectivity in Structure–activity Analysis , 1987 .

[10]  A. Hopfinger A QSAR investigation of dihydrofolate reductase inhibition by Baker triazines based upon molecular shape analysis , 1980 .

[11]  P. Jurs,et al.  Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. , 2000, Journal of medicinal chemistry.

[12]  Erik Evensen,et al.  A computational ensemble pharmacophore model for identifying substrates of P-glycoprotein. , 2002, Journal of medicinal chemistry.

[13]  D. Roden,et al.  The drug transporter P-glycoprotein limits oral absorption and brain entry of HIV-1 protease inhibitors. , 1998, The Journal of clinical investigation.

[14]  Oleg V. Tsodikov,et al.  Novel computer program for fast exact calculation of accessible and molecular surface areas and average surface curvature , 2002, J. Comput. Chem..

[15]  T. Litman,et al.  Structure-activity relationships of P-glycoprotein interacting drugs: kinetic characterization of their effects on ATPase activity. , 1997, Biochimica et biophysica acta.

[16]  Donald F. Specht,et al.  Probabilistic neural networks , 1990, Neural Networks.

[17]  M. Karelson,et al.  Quantum-Chemical Descriptors in QSAR/QSPR Studies. , 1996, Chemical reviews.

[18]  R. Czerminski,et al.  Use of Support Vector Machine in Pattern Classification: Application to QSAR Studies , 2001 .

[19]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[20]  Denis M. Bayada,et al.  Molecular Diversity and Representativity in Chemical Databases , 1999, J. Chem. Inf. Comput. Sci..

[21]  X. Chen,et al.  SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence , 2003, Nucleic Acids Res..

[22]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[23]  M Pastor,et al.  VolSurf: a new tool for the pharmacokinetic optimization of lead compounds. , 2000, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[24]  J. C. BurgesChristopher A Tutorial on Support Vector Machines for Pattern Recognition , 1998 .

[25]  W. Konings,et al.  Structure and function of multidrug transporters. , 1998, Advances in experimental medicine and biology.

[26]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[27]  Bernard F. Buxton,et al.  Support Vector Machines in Combinatorial Chemistry , 2001 .

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  L. Bain,et al.  Structure-activity relationships for xenobiotic transport substrates and inhibitory ligands of P-glycoprotein. , 1997, Environmental health perspectives.

[30]  A. Seelig A general pattern for substrate recognition by P-glycoprotein. , 1998, European journal of biochemistry.

[31]  G Klopman,et al.  Quantitative structure-activity relationship of multidrug resistance reversal agents. , 1997, Molecular pharmacology.

[32]  Eamonn F. Healy,et al.  Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model , 1985 .

[33]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[34]  David A. Gough,et al.  Predicting protein-protein interactions from primary structure , 2001, Bioinform..

[35]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[36]  Lemont B. Kier,et al.  Electrotopological State Indices for Atom Types: A Novel Combination of Electronic, Topological, and Valence State Information , 1995, J. Chem. Inf. Comput. Sci..

[37]  Cesare Furlanello,et al.  An accelerated procedure for recursive feature ranking on microarray data , 2003, Neural Networks.

[38]  Ekaterina Gordeeva,et al.  Traditional topological indexes vs electronic, geometrical, and combined molecular descriptors in QSAR/QSPR research , 1993, J. Chem. Inf. Comput. Sci..

[39]  C. Heckler Applied Discriminant Analysis , 1995 .

[40]  Lemont B. Kier,et al.  The electrotopological state: structure information at the atomic level for molecular graphs , 1991, J. Chem. Inf. Comput. Sci..

[41]  Charles E. Heckler,et al.  Applied Multivariate Statistical Analysis , 2005, Technometrics.

[42]  J E Roulston,et al.  Screening with tumor markers , 2002, Molecular biotechnology.

[43]  K. Skubitz P-glycoprotein and multidrug resistance. , 1990, American journal of clinical pathology.

[44]  U Norinder,et al.  Theoretical calculation and prediction of P-glycoprotein-interacting drugs using MolSurf parametrization and PLS statistics. , 2000, European journal of pharmaceutical sciences : official journal of the European Federation for Pharmaceutical Sciences.

[45]  Palanisamy Thanikaivelan,et al.  Application of quantum chemical descriptor in quantitative structure activity and structure property relationship , 2000 .

[46]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .