Prediction of PKCθ Inhibitory Activity Using the Random Forest Algorithm

This work is devoted to the prediction of a series of 208 structurally diverse PKCθ inhibitors using the Random Forest (RF) based on the Mold2 molecular descriptors. The RF model was established and identified as a robust predictor of the experimental pIC50 values, producing good external R2pred of 0.72, a standard error of prediction (SEP) of 0.45, for an external prediction set of 51 inhibitors which were not used in the development of QSAR models. By using the RF built-in measure of the relative importance of the descriptors, an important predictor—the number of group donor atoms for H-bonds (with N and O)—has been identified to play a crucial role in PKCθ inhibitory activity. We hope that the developed RF model will be helpful in the screening and prediction of novel unknown PKCθ inhibitory activity.

[1]  D. Boschelli,et al.  Synthesis and PKCtheta inhibitory activity of a series of 4-indolylamino-5-phenyl-3-pyridinecarbonitriles. , 2009, Bioorganic & medicinal chemistry letters.

[2]  M. Kasaian,et al.  PKCtheta: A potential therapeutic target for T-cell-mediated diseases. , 2006, Current opinion in investigational drugs.

[3]  D. Boschelli,et al.  Second generation 4-(4-methyl-1H-indol-5-ylamino)-2-phenylthieno[2,3-b]pyridine-5-carbonitrile PKCtheta inhibitors. , 2009, Bioorganic & medicinal chemistry letters.

[4]  Victor Kuzmin,et al.  Application of Random Forest Approach to QSAR Prediction of Aquatic Toxicity , 2009, J. Chem. Inf. Model..

[5]  Frank R. Burden,et al.  Toward Novel Universal Descriptors: Charge Fingerprints , 2009, J. Chem. Inf. Model..

[6]  M. Silva,et al.  PKC-θ-Deficient Mice Are Protected from Th1-Dependent Antigen-Induced Arthritis , 2006, The Journal of Immunology.

[7]  Gabriele Cruciani,et al.  Surface descriptors for protein-ligand affinity prediction. , 2003, Journal of medicinal chemistry.

[8]  Qingzhi Gao,et al.  3D-QSAR studies of boron-containing dipeptides as proteasome inhibitors with CoMFA and CoMSIA methods. , 2009, European journal of medicinal chemistry.

[9]  D. Boschelli,et al.  First generation 5-vinyl-3-pyridinecarbonitrile PKCtheta inhibitors. , 2009, Bioorganic & medicinal chemistry letters.

[10]  Paola Gramatica,et al.  The Importance of Being Earnest: Validation is the Absolute Essential for Successful Application and Interpretation of QSPR Models , 2003 .

[11]  Kunal Roy,et al.  QSAR Analyses of 3-(4-Benzylpiperidin-1-yl)-N-phenylpropylamine Derivatives as Potent CCR5 Antagonists , 2005, J. Chem. Inf. Model..

[12]  Eslam Pourbasheer,et al.  QSAR study on melanocortin-4 receptors by support vector machine. , 2010, European journal of medicinal chemistry.

[13]  Seng-Lai Tan,et al.  Resistance to Experimental Autoimmune Encephalomyelitis and Impaired IL-17 Production in Protein Kinase Cθ-Deficient Mice , 2006, The Journal of Immunology.

[14]  Robert C. Glen,et al.  Random Forest Models To Predict Aqueous Solubility , 2007, J. Chem. Inf. Model..

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  D. Boschelli,et al.  C-5 Substituted heteroaryl 3-pyridinecarbonitriles as PKCtheta inhibitors: Part I. , 2009, Bioorganic & medicinal chemistry letters.

[17]  D. Boschelli Small Molecule Inhibitors of PKCθ as Potential Antiinflammatory Therapeutics , 2009 .

[18]  Minghu Song,et al.  Three-dimensional quantitative structure-activity relationship analyses of piperidine-based CCR5 receptor antagonists. , 2004, Bioorganic & medicinal chemistry.

[19]  Roberto Kawakami Harrop Galvão,et al.  The successive projections algorithm for spectral variable selection in classification problems , 2005 .

[20]  Kuo-Chen Chou,et al.  Support vector machines for predicting HIV protease cleavage sites in protein , 2002, J. Comput. Chem..

[21]  S. Morgan,et al.  Outlier detection in multivariate analytical chemical data. , 1998, Analytical chemistry.

[22]  Alexander Golbraikh,et al.  Rational selection of training and test sets for the development of validated QSAR models , 2003, J. Comput. Aided Mol. Des..

[23]  Hongzong Si,et al.  Quantitative structure activity relationship study on EC50 of anti-HIV drugs , 2008 .

[24]  Francis Eng Hock Tay,et al.  Modified support vector machines in financial time series forecasting , 2002, Neurocomputing.

[25]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[26]  Eslam Pourbasheer,et al.  Application of genetic algorithm-support vector machine (GA-SVM) for prediction of BK-channels activity. , 2009, European journal of medicinal chemistry.

[27]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[28]  Reaz Uddin,et al.  Receptor-Based Modeling and 3D-QSAR for a Quantitative Production of the Butyrylcholinesterase Inhibitors Based on Genetic Algorithm , 2008, J. Chem. Inf. Model..

[29]  Haifeng Chen,et al.  Comparative Study of QSAR/QSPR Correlations Using Support Vector Machines, Radial Basis Function Neural Networks, and Multiple Linear Regression , 2004, J. Chem. Inf. Model..

[30]  Kunal Roy,et al.  On Selection of Training and Test Sets for the Development of Predictive QSAR models , 2006 .

[31]  Russell G. Jones,et al.  PKCθ Signals Activation versus Tolerance In Vivo , 2004, The Journal of experimental medicine.

[32]  Alexander Golbraikh,et al.  Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection , 2004, Molecular Diversity.

[33]  D. Boschelli,et al.  Synthesis and PKCtheta inhibitory activity of a series of 5-vinyl phenyl sulfonamide-3-pyridinecarbonitriles. , 2009, Bioorganic & medicinal chemistry letters.

[34]  B. Kowalski,et al.  Partial least-squares regression: a tutorial , 1986 .

[35]  Ling Yang,et al.  An in silico approach for screening flavonoids as P-glycoprotein inhibitors based on a Bayesian-regularized neural network , 2005, J. Comput. Aided Mol. Des..

[36]  H. X. Liu,et al.  The prediction of human oral absorption for diffusion rate-limited drugs based on heuristic method and support vector machine , 2005, J. Comput. Aided Mol. Des..

[37]  Yutaka Endo,et al.  Development of a Method for Evaluating Drug-Likeness and Ease of Synthesis Using a Data Set in Which Compounds Are Assigned Scores Based on Chemists' Intuition , 2003, J. Chem. Inf. Comput. Sci..

[38]  D. Yin,et al.  Deficiency of Protein Kinase C-Theta Facilitates Tolerance Induction , 2009, Transplantation.

[39]  P. Jurs,et al.  Classification of multidrug-resistance reversal agents using structure-based descriptors and linear discriminant analysis. , 2000, Journal of medicinal chemistry.

[40]  David Haussler,et al.  Classifying G-protein coupled receptors with support vector machines , 2002, Bioinform..

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  Zhide Hu,et al.  Quantitative structure activity relationship model for predicting the depletion percentage of skin allergic chemical substances of glutathione. , 2007, Analytica chimica acta.

[43]  B Testa,et al.  Predicting blood-brain barrier permeation from three-dimensional molecular structure. , 2000, Journal of medicinal chemistry.

[44]  Bin Wang,et al.  An In Silico Method for Screening Nicotine Derivatives as Cytochrome P450 2A6 Selective Inhibitors Based on Kernel Partial Least Squares , 2007, International Journal of Molecular Sciences.

[45]  D. Boschelli,et al.  2-Alkenylthieno[2,3-b]pyridine-5-carbonitriles: Potent and selective inhibitors of PKCtheta. , 2008, Bioorganic & medicinal chemistry letters.

[46]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[47]  George Kollias,et al.  A combined LS-SVM & MLR QSAR workflow for predicting the inhibition of CXCR3 receptor by quinazolinone analogs , 2010, Molecular Diversity.

[48]  T. So,et al.  Protein Kinase Cθ Controls Th1 Cells in Experimental Autoimmune Encephalomyelitis1 , 2005, The Journal of Immunology.

[49]  Yan Li,et al.  In silico Prediction of Androgenic and Nonandrogenic Compounds Using Random Forest , 2009 .

[50]  D. Boschelli,et al.  Optimization of 5-phenyl-3-pyridinecarbonitriles as PKCtheta inhibitors. , 2009, Bioorganic & medicinal chemistry letters.

[51]  D. Boschelli,et al.  5-Vinyl-3-pyridinecarbonitrile inhibitors of PKCtheta: optimization of enzymatic and functional activity. , 2009, Bioorganic & medicinal chemistry.

[52]  Weida Tong,et al.  Mold2, Molecular Descriptors from 2D Structures for Chemoinformatics and Toxicoinformatics , 2008, J. Chem. Inf. Model..

[53]  F. Burden Molecular identification number for substructure searches , 1989, J. Chem. Inf. Comput. Sci..