Implications of Additivity and Nonadditivity for Machine Learning and Deep Learning Models in Drug Design

Matched molecular pairs (MMPs) are nowadays a commonly applied concept in drug design. They are used in many computational tools for structure–activity relationship analysis, biological activity prediction, or optimization of physicochemical properties. However, until now it has not been shown in a rigorous way that MMPs, that is, changing only one substituent between two molecules, can be predicted with higher accuracy and precision in contrast to any other chemical compound pair. It is expected that any model should be able to predict such a defined change with high accuracy and reasonable precision. In this study, we examine the predictability of four classical properties relevant for drug design ranging from simple physicochemical parameters (log D and solubility) to more complex cell-based ones (permeability and clearance), using different data sets and machine learning algorithms. Our study confirms that additive data are the easiest to predict, which highlights the importance of recognition of nonadditivity events and the challenging complexity of predicting properties in case of scaffold hopping. Despite deep learning being well suited to model nonlinear events, these methods do not seem to be an exception of this observation. Though they are in general performing better than classical machine learning methods, this leaves the field with a still standing challenge.

[1]  R. Zhong,et al.  Machine Learning Models for the Classification of CK2 Natural Products Inhibitors with Molecular Fingerprint Descriptors , 2021, Processes.

[2]  Hua Wu,et al.  Geometry-enhanced molecular representation learning for property prediction , 2021, Nature Machine Intelligence.

[3]  Amir Barati Farimani,et al.  Molecular contrastive learning of representations via graph neural networks , 2021, Nature Machine Intelligence.

[4]  C. Tyrchan,et al.  Nonadditivity in public and inhouse data: implications for drug design , 2020, Journal of Cheminformatics.

[5]  Christian Kramer,et al.  Matched Molecular Series Analysis for ADME Property Prediction , 2020, J. Chem. Inf. Model..

[6]  Meir Glick,et al.  Experimental error, kurtosis, activity cliffs, and methodology: What limits the predictivity of QSAR models? , 2020, Journal of chemical information and modeling.

[7]  Jürgen Bajorath,et al.  Introducing a new category of activity cliffs combining different compound similarity criteria. , 2020, RSC medicinal chemistry.

[8]  Xiaojian Wang,et al.  Machine Learning Models Based on Molecular Fingerprints and an Extreme Gradient Boosting Method Lead to the Discovery of JAK2 Inhibitors , 2019, J. Chem. Inf. Model..

[9]  Takuya Akiba,et al.  Optuna: A Next-generation Hyperparameter Optimization Framework , 2019, KDD.

[10]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[11]  Jürgen Bajorath,et al.  Multitask Machine Learning for Classifying Highly and Weakly Potent Kinase Inhibitors , 2019, ACS Omega.

[12]  Kiev S. Ly,et al.  Mathematical and Structural Characterization of Strong Nonadditive Structure-Activity Relationship Caused by Protein Conformational Changes. , 2018, Journal of medicinal chemistry.

[13]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[14]  Jérôme Hert,et al.  mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets , 2018, J. Chem. Inf. Model..

[15]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[16]  Emanuel S. R. Ehmki,et al.  Matched Molecular Series: Measuring SAR Similarity , 2017, J. Chem. Inf. Model..

[17]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[18]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[19]  Christian Tyrchan,et al.  Matched Molecular Pair Analysis in Short: Algorithms, Applications and Limitations , 2016, Computational and structural biotechnology journal.

[20]  Jürgen Bajorath,et al.  Advances in Activity Cliff Research , 2016, Molecular informatics.

[21]  Christian Kramer,et al.  Strong Nonadditivity as a Key Structure–Activity Relationship Feature: Distinguishing Structural Changes from Assay Artifacts , 2015, J. Chem. Inf. Model..

[22]  Heike Schönherr,et al.  Profound methyl effects in drug discovery and a call for new C-H methylation reactions. , 2013, Angewandte Chemie.

[23]  Heike Schönherr,et al.  Profound methyl effects in drug discovery and a call for new C-H methylation reactions. , 2013, Angewandte Chemie.

[24]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[25]  A. Vulpetti,et al.  Comparability of Mixed IC50 Data – A Statistical Analysis , 2013, PloS one.

[26]  Kathrin Heikamp,et al.  Do medicinal chemists learn from activity cliffs? A systematic evaluation of cliff progression in evolving compound data sets. , 2013, Journal of medicinal chemistry.

[27]  A. Vulpetti,et al.  The experimental uncertainty of heterogeneous public K(i) data. , 2012, Journal of medicinal chemistry.

[28]  W. L. Jorgensen,et al.  Methyl effects on protein-ligand binding. , 2012, Journal of medicinal chemistry.

[29]  Gerhard Klebe,et al.  Non-additivity of functional group contributions in protein-ligand binding: a comprehensive study by crystallography and isothermal titration calorimetry. , 2010, Journal of molecular biology.

[30]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[31]  Christian Kramer,et al.  Nonadditivity Analysis , 2019, J. Chem. Inf. Model..