Prediction of Activity Cliffs Using Condensed Graphs of Reaction Representations, Descriptor Recombination, Support Vector Machine Classification, and Support Vector Regression

Activity cliffs (ACs) are formed by structurally similar compounds with large differences in activity. Accordingly, ACs are of high interest for the exploration of structure-activity relationships (SARs). ACs reveal small chemical modifications that result in profound biological effects. The ability to foresee such small chemical changes with significant biological consequences would represent a major advance for drug design. Nevertheless, only few attempts have been made so far to predict whether a pair of analogues is likely to represent an AC-and even fewer went further to quantitatively predict how "deep" a cliff might be. This might be due to the fact that such predictions must focus on compound pairs. Matched molecular pairs (MMPs), defined as pairs of structural analogs that are only distinguished by a chemical modification at a single site, are a preferred representation of ACs. Herein, we report new strategies for AC prediction that are based upon two different approaches: (i) condensed graphs of reactions, which were originally introduced for modeling of chemical reactions and were here adapted to encode MMPs, and, (ii) plain descriptor recombination-a strategy used for quantitative structure-property relationship (QSPR) modeling of nonadditive mixtures (MQSPR). By applying these concepts, ACs were encoded as single descriptor vectors used as input for support vector machine (SVM) classification and support vector regression (SVR), yielding accurate predictions of AC status (i.e., cliff vs noncliff) and potency differences, respectively. The latter were predicted in a compound order-sensitive manner returning the signed value of expected potency differences between AC compounds.

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  Jürgen Bajorath,et al.  Recent progress in understanding activity cliffs and their utility in medicinal chemistry. , 2014, Journal of medicinal chemistry.

[3]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[4]  Daniel J. Warner,et al.  Matched molecular pairs as a medicinal chemistry tool. , 2011, Journal of medicinal chemistry.

[5]  Philip Judson,et al.  Definition of the Applicability Domains of Knowledge-based Predictive Toxicology Expert Systems by Using a Structural Fragment-based Approach , 2009, Alternatives to laboratory animals : ATLA.

[6]  Clayton Springer,et al.  Quantitative Structure-Activity Relationship Models of Chemical Transformations from Matched Pairs Analyses , 2014, J. Chem. Inf. Model..

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  James Kennedy,et al.  Particle swarm optimization , 1995, Proceedings of ICNN'95 - International Conference on Neural Networks.

[9]  Eugene N Muratov,et al.  Existing and Developing Approaches for QSAR Analysis of Mixtures , 2012, Molecular informatics.

[10]  G Marcou,et al.  QSPR Approach to Predict Nonadditive Properties of Mixtures. Application to Bubble Point Temperatures of Binary Mixtures of Liquids , 2012, Molecular informatics.

[11]  I. Tetko,et al.  ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors , 2008 .

[12]  Jürgen Bajorath,et al.  Similarity searching , 2011 .

[13]  Jürgen Bajorath,et al.  MMP-Cliffs: Systematic Identification of Activity Cliffs on the Basis of Matched Molecular Pairs , 2012, J. Chem. Inf. Model..

[14]  Jürgen Bajorath,et al.  Searching for Coordinated Activity Cliffs Using Particle Swarm Optimization , 2012, J. Chem. Inf. Model..

[15]  Gilles Marcou,et al.  Mining Chemical Reactions Using Neighborhood Behavior and Condensed Graphs of Reactions Approaches , 2012, J. Chem. Inf. Model..

[16]  Gilles Marcou,et al.  An Evolutionary Optimizer of libsvm Models , 2014 .

[17]  Gerald M. Maggiora,et al.  On Outliers and Activity Cliffs-Why QSAR Often Disappoints , 2006, J. Chem. Inf. Model..

[18]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[19]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[20]  Jürgen Bajorath,et al.  Prediction of Compound Potency Changes in Matched Molecular Pairs Using Support Vector Regression , 2014, J. Chem. Inf. Model..

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[23]  D. Horvath,et al.  ISIDA Property‐Labelled Fragment Descriptors , 2010, Molecular informatics.

[24]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[25]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[26]  Nicolas Lachiche,et al.  A Representation to Apply Usual Data Mining Techniques to Chemical reactions - Illustration on the Rate Constant of SN2 reactions in water , 2011, Int. J. Artif. Intell. Tools.

[27]  Kathrin Heikamp,et al.  Prediction of Activity Cliffs Using Support Vector Machines , 2012, J. Chem. Inf. Model..

[28]  Rajarshi Guha,et al.  Exploring Uncharted Territories: Predicting Activity Clis in Structure-Activity Landscapes , 2012, J. Chem. Inf. Model..