Kernel-Based Partial Least Squares: Application to Fingerprint-Based QSAR with Model Visualization

Numerous regression-based and machine learning techniques are available for the development of linear and nonlinear QSAR models that can accurately predict biological endpoints. Such tools can be quite powerful in the hands of an experienced modeler, but too frequently a disconnect remains between the modeler and project chemist because the resulting QSAR models are effectively black boxes. As a result, learning methods that yield models that can be visualized in the context of chemical structures are in high demand. In this work, we combine direct kernel-based PLS with Canvas 2D fingerprints to arrive at predictive QSAR models that can be projected onto the atoms of a chemical structure, allowing immediate identification of favorable and unfavorable characteristics. The method is validated using binding affinities for ligands from 10 different protein targets covering 7 distinct protein families. Models with significant predictive ability (test set Q(2) > 0.5) are obtained for 6 of 10 data sets, and fingerprints are shown to consistently outperform large collections of classical physicochemical and topological descriptors. In addition, we demonstrate how a simple bootstrapping technique may be employed to obtain uncertainties that provide meaningful estimates of prediction accuracy.

[1]  Chang Park,et al.  Structure-based design, synthesis, and biological evaluation of potent and selective macrocyclic checkpoint kinase 1 inhibitors. , 2007, Journal of medicinal chemistry.

[2]  R. Knabb,et al.  Synthesis and SAR of benzamidine factor Xa inhibitors containing a vicinally-substituted heterocyclic core. , 2001, Bioorganic & medicinal chemistry letters.

[3]  Scott A. Erickson,et al.  Discovery and optimization of anthranilic acid sulfonamides as inhibitors of methionine aminopeptidase-2: a structural basis for the reduction of albumin binding. , 2006, Journal of medicinal chemistry.

[4]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[5]  Gergana Dimitrova,et al.  A Stepwise Approach for Defining the Applicability Domain of SAR and QSAR Models , 2005, J. Chem. Inf. Model..

[6]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[7]  Steven L. Dixon,et al.  Investigation of classification methods for the prediction of activity in diverse chemical libraries , 1999, J. Comput. Aided Mol. Des..

[8]  Anna Vulpetti,et al.  Benzodipyrazoles: a new class of potent CDK2 inhibitors. , 2005, Bioorganic & medicinal chemistry letters.

[9]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .

[10]  Robert Steffan,et al.  Indazole-based liver X receptor (LXR) modulators with maintained atherosclerotic lesion reduction activity but diminished stimulation of hepatic triglyceride synthesis. , 2008, Journal of medicinal chemistry.

[11]  Woody Sherman,et al.  Hole filling and library optimization: application to commercially available fragment libraries. , 2012, Bioorganic & medicinal chemistry.

[12]  Woody Sherman,et al.  Analysis and comparison of 2D fingerprints: insights into database screening performance using eight fingerprint methods , 2010, J. Cheminformatics.

[13]  Roman Rosipal,et al.  Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space , 2002, J. Mach. Learn. Res..

[14]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[15]  Robert P. Sheridan,et al.  Similarity to Molecules in the Training Set Is a Good Discriminator for Prediction Accuracy in QSAR , 2004, J. Chem. Inf. Model..

[16]  Lawrence C Kuo,et al.  Unexpected enhancement of thrombin inhibitor potency with o-aminoalkylbenzylamides in the P1 position. , 2003, Bioorganic & medicinal chemistry letters.

[17]  Frank R. Burden,et al.  Holographic QSAR of benzodiazepines , 1998 .

[18]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[19]  P. A. Harris,et al.  Oxindole-based inhibitors of cyclin-dependent kinase 2 (CDK2): design, synthesis, enzymatic activities, and X-ray crystallographic analysis. , 2001, Journal of medicinal chemistry.

[20]  R. Cramer,et al.  Comparative molecular field analysis (CoMFA). 1. Effect of shape on binding of steroids to carrier proteins. , 1988, Journal of the American Chemical Society.

[21]  Mike Wood,et al.  4,5-diarylisoxazole Hsp90 chaperone inhibitors: potential therapeutic agents for the treatment of cancer. , 2007, Journal of medicinal chemistry.

[22]  D L Cheney,et al.  Design and structure-activity relationships of potent and selective inhibitors of blood coagulation factor Xa. , 1999, Journal of medicinal chemistry.

[23]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[24]  David E. Shaw,et al.  PHASE: a new engine for pharmacophore perception, 3D QSAR model development, and 3D database screening: 1. Methodology and preliminary results , 2006, J. Comput. Aided Mol. Des..

[25]  Gilles Klopman,et al.  Antifungal triazole alcohols: A comparative analysis of structure-activity, structure-teratogenicity and structure-therapeutic index relationships using the Multiple Computer-Automated Structure Evaluation (Multi-CASE) methodology , 1993, J. Comput. Aided Mol. Des..

[26]  Keizo Yamashita,et al.  Highly potent inhibitors of methionine aminopeptidase-2 based on a 1,2,4-triazole pharmacophore. , 2007, Journal of medicinal chemistry.

[27]  Woody Sherman,et al.  Large-Scale Systematic Analysis of 2D Fingerprint Methods and Parameters to Improve Virtual Screening Enrichments , 2010, J. Chem. Inf. Model..

[28]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[29]  Anton J. Hopfinger,et al.  Four-Dimensional Quantitative Structure-Activity Relationship Analysis of a Series of Interphenylene 7-Oxabicycloheptane Oxazole Thromboxane A2 Receptor Antagonists , 1998, J. Chem. Inf. Comput. Sci..

[30]  G. Klebe,et al.  Molecular similarity indices in a comparative analysis (CoMSIA) of drug molecules to correlate and predict their biological activity. , 1994, Journal of medicinal chemistry.