Prediction of P2Y12 antagonists using a novel genetic algorithm-support vector machine coupled approach.

Presently, a genetic algorithm (GA)-support vector machine (SVM) coupled approach is proposed for optimizing the 2D molecular descriptor subset generated for series of P2Y(12) (members of the G-protein-coupled receptor family) antagonists, with the statistical performance and efficiency of the model being simultaneously enhanced by SVM kernel-based nonlinear projection. As we know, this is the first QSAR study for prediction of P2Y(12) inhibition activity based on an unusually large dataset of 364 P2Y(12) antagonists with diversity of structures. In addition, three other widely used approaches, i.e., partial least squares (PLS), random forest (RF), and Gaussian process (GP) routines combined with GA (namely, GA-PLS, GA-RF, GA-GP, respectively) are also employed and compared with the GA-SVM method in terms of several rigorous evaluation criteria. The obtained results indicate that the GA-SVM model is a powerful tool for prediction of P2Y(12) antagonists, producing a conventional correlation coefficient R(2) of 0.976 and R(cv)(2) (cross-validation) of 0.829 for the training set as well as R(pred)(2) of 0.811 for the test set, which significantly outperforms the other three methods with the average R(2)=0.894, R(cv)(2)=0.741, R(pred)(2)=0.693. The proposed model with excellent prediction capacity from both the internal to external quality should be helpful for screening and optimization of potential P2Y(12) antagonists prior to chemical synthesis in drug development.

[1]  M. Freund,et al.  Differential Involvement of the P2Y1 and P2Y12 Receptors in Platelet Procoagulant Activity , 2003, Arteriosclerosis, thrombosis, and vascular biology.

[2]  Desire L. Massart,et al.  Artificial neural networks in classification of NIR spectral data: Design of the training set , 1996 .

[3]  Bahram Hemmateenejad,et al.  Net analyte signal–artificial neural network (NAS–ANN) model for efficient nonlinear multivariate calibration , 2005 .

[4]  B. Hemmateenejad,et al.  A segmented principal component analysis-regression approach to quantitative structure-activity relationship modeling. , 2009, Analytica chimica acta.

[5]  Kimito Funatsu,et al.  GA Strategy for Variable Selection in QSAR Studies: GA-Based PLS Analysis of Calcium Channel Antagonists , 1997, J. Chem. Inf. Comput. Sci..

[6]  Cheng-Lung Huang,et al.  A GA-based feature selection and parameters optimizationfor support vector machines , 2006, Expert Syst. Appl..

[7]  E. Martin,et al.  Gaussian process regression for multivariate spectroscopic calibration , 2007 .

[8]  P. Nurden,et al.  Role of ADP Receptor P2Y12 in Platelet Adhesion and Thrombus Formation in Flowing Blood , 2002, Arteriosclerosis, thrombosis, and vascular biology.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Jonathan D. Hirst,et al.  Contemporary QSAR Classifiers Compared , 2007, J. Chem. Inf. Model..

[11]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[12]  Hua Gao,et al.  Application of BCUT Metrics and Genetic Algorithm in Binary QSAR Analysis , 2001, J. Chem. Inf. Comput. Sci..

[13]  O. Deeb,et al.  Effect of the electronic and physicochemical parameters on the carcinogenesis activity of some sulfa drugs using QSAR analysis based on genetic-MLR and genetic-PLS. , 2007, Chemosphere.

[14]  F. Burden A CHEMICALLY INTUITIVE MOLECULAR INDEX BASED ON THE EIGENVALUES OF A MODIFIED ADJACENCY MATRIX , 1997 .

[15]  K. Müller,et al.  Predicting Lipophilicity of Drug‐Discovery Molecules using Gaussian Process Models , 2007, ChemMedChem.

[16]  M. Ganjali,et al.  Exploring QSARs for Antiviral Activity of 4‐Alkylamino‐6‐(2‐hydroxyethyl)‐2‐methylthiopyrimidines by Support Vector Machine , 2008, Chemical biology & drug design.

[17]  Ericka Stricklin-Parker,et al.  Ann , 2005 .

[18]  P. Roy,et al.  On Some Aspects of Variable Selection for Partial Least Squares Regression Models , 2008 .

[19]  Gábor Csányi,et al.  Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties , 2007, J. Chem. Inf. Model..

[20]  Kevin P. Bliden,et al.  Clopidogrel for Coronary Stenting Response Variability, Drug Resistance, and the Effect of Pretreatment Platelet Reactivity , 2003, Circulation.

[21]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[22]  Roberto Kawakami Harrop Galvão,et al.  The successive projections algorithm for spectral variable selection in classification problems , 2005 .

[23]  P. Jurs,et al.  Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. , 2000, Journal of medicinal chemistry.

[24]  S. Rapp,et al.  Piperazinyl glutamate pyridines as potent orally bioavailable P2Y12 antagonists for inhibition of platelet aggregation. , 2010, Journal of medicinal chemistry.

[25]  H. Horiuchi Recent advance in antiplatelet therapy: The mechanisms, evidence and approach to the problems , 2006, Annals of medicine.

[26]  Mark L. Lewis,et al.  Predicting Penetration Across the Blood-Brain Barrier from Simple Descriptors and Fragmentation Schemes , 2007, J. Chem. Inf. Model..

[27]  Roberto Kawakami Harrop Galvão,et al.  A method for calibration and validation subset partitioning. , 2005, Talanta.

[28]  Thomas Hofmann,et al.  Predicting CNS Permeability of Drug Molecules: Comparison of Neural Network and Support Vector Machine Algorithms , 2002, J. Comput. Biol..

[29]  Qing-Song Xu,et al.  Support vector machines and its applications in chemistry , 2009 .

[30]  Matthew D. Segall,et al.  Gaussian Processes for Classification: QSAR Modeling of ADMET and Target Activity , 2010, J. Chem. Inf. Model..

[31]  Stephen D. Pickett,et al.  Classification of Kinase Inhibitors Using BCUT Descriptors , 2000, J. Chem. Inf. Comput. Sci..

[32]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[33]  Yan Li,et al.  In silico Prediction of Androgenic and Nonandrogenic Compounds Using Random Forest , 2009 .

[34]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[35]  Yan Li,et al.  Prediction of binding affinity for estrogen receptorα modulators using statistical learning approaches , 2008, Molecular Diversity.

[36]  Herman van Vlijmen,et al.  Recent advances in chemoinformatics. , 2007, Journal of chemical information and modeling.

[37]  Yan Li,et al.  Prediction of PKCθ Inhibitory Activity Using the Random Forest Algorithm , 2010, International journal of molecular sciences.

[38]  J. Moake,et al.  Blockade of adenosine diphosphate receptors P2Y(12) and P2Y(1) is required to inhibit platelet aggregation in whole blood under flow. , 2001, Blood.

[39]  D. E. Goldberg,et al.  Genetic Algorithms in Search , 1989 .

[40]  Frank R. Burden,et al.  Quantitative Structure-Activity Relationship Studies Using Gaussian Processes , 2001, J. Chem. Inf. Comput. Sci..

[41]  S. Rapp,et al.  Piperazinyl-glutamate-pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. , 2009, Bioorganic & medicinal chemistry letters.

[42]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[43]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[44]  Eslam Pourbasheer,et al.  QSAR study on melanocortin-4 receptors by support vector machine. , 2010, European journal of medicinal chemistry.

[45]  Bin Wang,et al.  An In Silico Method for Screening Nicotine Derivatives as Cytochrome P450 2A6 Selective Inhibitors Based on Kernel Partial Least Squares , 2007, International Journal of Molecular Sciences.

[46]  Klaus-Robert Müller,et al.  Accurate Solubility Prediction with Error Bars for Electrolytes: A Machine Learning Approach , 2007, J. Chem. Inf. Model..

[47]  Peng Zhou,et al.  Gaussian process: an alternative approach for QSAM modeling of peptides , 2008, Amino Acids.

[48]  Juan M. Luco,et al.  Prediction of the Brain-Blood Distribution of a Large Set of Drugs from Structurally Derived Descriptors Using Partial Least-Squares (PLS) Modeling , 1999, J. Chem. Inf. Comput. Sci..

[49]  Zhide Hu,et al.  Prediction of electrophoretic mobility of substituted aromatic acids in different aqueous–alcoholic solvents by capillary zone electrophoresis based on support vector machine , 2004 .

[50]  M. Taha,et al.  Ligand-based assessment of factor Xa binding site flexibility via elaborate pharmacophore exploration and genetic algorithm-based QSAR modeling. , 2005, European journal of medicinal chemistry.

[51]  Jun Ding,et al.  Classification of bioaccumulative and non-bioaccumulative chemicals using statistical learning approaches , 2008, Molecular Diversity.

[52]  Deepak L. Bhatt,et al.  Clinical Aspects of Platelet Inhibitors and Thrombus Formation , 2007, Circulation research.

[53]  Alexander Golbraikh,et al.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection , 2002, J. Comput. Aided Mol. Des..

[54]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[55]  K. Roy,et al.  Predictive QSAR modeling of CCR5 antagonist piperidine derivatives using chemometric tools , 2009, Journal of enzyme inhibition and medicinal chemistry.

[56]  Peter Tiño,et al.  Nonlinear Prediction of Quantitative Structure-Activity Relationships , 2004, J. Chem. Inf. Model..

[57]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[58]  R Todeschini,et al.  Resolution of mixtures of three nonsteroidal anti-inflammatory drugs by fluorescence using partial least squares multivariate calibration with previous wavelength selection by Kohonen artificial neural networks. , 2000, Talanta.

[59]  Ling Yang,et al.  Classification of Substrates and Inhibitors of P-Glycoprotein Using Unsupervised Machine Learning Approach , 2005, J. Chem. Inf. Model..

[60]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[61]  Bahram Hemmateenejad,et al.  Quantitative structure-retention relationship for the Kovats retention indices of a large set of terpenes: a combined data splitting-feature selection strategy. , 2007, Analytica chimica acta.

[62]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[63]  Lawrence. Davis,et al.  Handbook Of Genetic Algorithms , 1990 .

[64]  P. Jurs,et al.  Development of binary classification of structural chromosome aberrations for a diverse set of organic compounds from molecular structure. , 2003, Chemical research in toxicology.

[65]  Cliff T. Ragsdale,et al.  Combining a neural network with a genetic algorithm for process parameter optimization , 2000 .

[66]  Mohammad Hossein Fatemi,et al.  Prediction of bioconcentration factor using genetic algorithm and artificial neural network , 2003 .

[67]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[68]  C. Gachet ADP Receptors of Platelets and their Inhibition , 2001, Thrombosis and Haemostasis.

[69]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[70]  R. Nicholas Identification of the P2Y(12) receptor: a novel member of the P2Y family of receptors activated by extracellular nucleotides. , 2001, Molecular pharmacology.

[71]  A. Tropsha,et al.  Beware of q2! , 2002, Journal of molecular graphics & modelling.

[72]  P. Leeson,et al.  From ATP to AZD6140: the discovery of an orally active reversible P2Y12 receptor antagonist for the prevention of thrombosis. , 2007, Bioorganic & medicinal chemistry letters.

[73]  Dong-Sheng Cao,et al.  Prediction of aqueous solubility of druglike organic compounds using partial least squares, back‐propagation network and support vector machine , 2010 .

[74]  David T. Stanton,et al.  Evaluation and Use of BCUT Descriptors in QSAR and QSPR Studies , 1999, J. Chem. Inf. Comput. Sci..

[75]  F. Burden Molecular identification number for substructure searches , 1989, J. Chem. Inf. Comput. Sci..

[76]  David Julius,et al.  Identification of the platelet ADP receptor targeted by antithrombotic drugs , 2001, Nature.

[77]  Emilio Benfenati,et al.  A QSAR Study of Avian Oral Toxicity using Support Vector Machines and Genetic Algorithms , 2006 .

[78]  D P Enot,et al.  Gaussian Process: An Efficient Technique to Solve Quantitative Structure-Property Relationship Problems , 2001, SAR and QSAR in environmental research.

[79]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[80]  Yan Li,et al.  In Silico Prediction of Estrogen Receptor Subtype Binding Affinity and Selectivity Using Statistical Methods and Molecular Docking with 2-Arylnaphthalenes and 2-Arylquinolines , 2010, International journal of molecular sciences.

[81]  Ting Wang,et al.  Boosting: An Ensemble Learning Tool for Compound Classification and QSAR Modeling , 2005, J. Chem. Inf. Model..

[82]  M. P. Callao,et al.  Monitoring ethylene content in heterophasic copolymers by near-infrared spectroscopy: Standardisation of the calibration model , 2001 .