QSAR-assisted-MMPA to expand chemical transformation space for lead optimization

Matched molecular pairs analysis (MMPA) has become a powerful tool for automatically and systematically identifying medicinal chemistry transformations from compound/property datasets. However, accurate determination of matched molecular pair (MMP) transformations largely depend on the size and quality of existing experimental data. Lack of high-quality experimental data heavily hampers the extraction of more effective medicinal chemistry knowledge. Here, we developed a new strategy called quantitative structure-activity relationship (QSAR)-assisted-MMPA to expand the number of chemical transformations and took the logD7.4 property endpoint as an example to demonstrate the reliability of the new method. A reliable logD7.4 consensus prediction model was firstly established, and its applicability domain was strictly assessed. By applying the reliable logD7.4 prediction model to screen two chemical databases, we obtained more high-quality logD7.4 data by defining a strict applicability domain threshold. Then, MMPA was performed on the predicted data and experimental data to derive more chemical rules. To validate the reliability of the chemical rules, we compared the magnitude and directionality of the property changes of the predicted rules with those of the measured rules. Then, we compared the novel chemical rules generated by our proposed approach with the published chemical rules, and found that the magnitude and directionality of the property changes were consistent, indicating that the proposed QSAR-assisted-MMPA approach has the potential to enrich the collection of rule types or even identify completely novel rules. Finally, we found that the number of the MMP rules derived from the experimental data could be amplified by the predicted data, which is helpful for us to analyze the medicinal chemical rules in local chemical environment. In summary, the proposed QSAR-assisted-MMPA approach could be regarded as a very promising strategy to expand the chemical transformation space for lead optimization, especially when no enough experimental data can support MMPA.

[1]  R. Obach,et al.  Physicochemical determinants of human renal clearance. , 2009, Journal of medicinal chemistry.

[2]  Andrew G. Leach,et al.  Can we accelerate medicinal chemistry by augmenting the chemist with Big Data and artificial intelligence? , 2018, Drug discovery today.

[3]  Julian E. Fuchs,et al.  Matched molecular pair analysis: significance and the impact of experimental uncertainty. , 2014, Journal of medicinal chemistry.

[4]  Visakan Kadirkamanathan,et al.  Lead Optimization Using Matched Molecular Pairs: Inclusion of Contextual Information for Enhanced Prediction of hERG Inhibition, Solubility, and Lipophilicity , 2010, J. Chem. Inf. Model..

[5]  Jameed Hussain,et al.  Computationally Efficient Algorithm to Identify Matched Molecular Pairs (MMPs) in Large Data Sets , 2010, J. Chem. Inf. Model..

[6]  Jérôme Hert,et al.  Learning Medicinal Chemistry Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) Rules from Cross-Company Matched Molecular Pairs Analysis (MMPA). , 2017, Journal of medicinal chemistry.

[7]  Colin D. Brown,et al.  LogD: lipophilicity for ionisable compounds. , 2008, Chemosphere.

[8]  Lei Chen,et al.  ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. , 2011, Molecular pharmaceutics.

[9]  Jérôme Hert,et al.  mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets , 2018, J. Chem. Inf. Model..

[10]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 8. The Prediction of Human Intestinal Absorption by a Support Vector Machine , 2007, J. Chem. Inf. Model..

[11]  Li Fu,et al.  Systematic Modeling of logD7.4 Based on Ensemble Machine Learning, Group Contribution and Matched Molecular Pair Analysis. , 2019, Journal of chemical information and modeling.

[12]  Daniel J. Warner,et al.  WizePairZ: A Novel Algorithm to Identify, Encode, and Exploit Matched Molecular Pairs with Unspecified Cores in Medicinal Chemistry , 2010, J. Chem. Inf. Model..

[13]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[14]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[15]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[16]  Dong-Sheng Cao,et al.  Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity , 2010 .

[17]  Bertrand Michel,et al.  Correlation and variable importance in random forests , 2013, Statistics and Computing.

[18]  Jens Sadowski,et al.  Structure Modification in Chemical Databases , 2005 .

[19]  J. Bajorath,et al.  Data structures and computational tools for the extraction of SAR information from large compound sets. , 2010, Drug discovery today.

[20]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[21]  Darrell Whitley,et al.  A genetic algorithm tutorial , 1994, Statistics and Computing.

[22]  Daniel M. Lowe,et al.  ADMET rules of thumb II: A comparison of the effects of common substituents on a range of ADMET parameters. , 2009, Bioorganic & medicinal chemistry.

[23]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[24]  Igor V. Tetko,et al.  How Accurately Can We Predict the Melting Points of Drug-like Compounds? , 2014, J. Chem. Inf. Model..

[25]  Dong-Sheng Cao,et al.  The boosting: A new idea of building models , 2010 .

[26]  Sonia Lobo,et al.  Is there enough focus on lipophilicity in drug discovery? , 2019, Expert opinion on drug discovery.

[27]  H. Wiener Structural determination of paraffin boiling points. , 1947, Journal of the American Chemical Society.

[28]  Michel Petitjean,et al.  Applications of the radius-diameter diagram to the classification of topological and geometrical shapes of chemical compounds , 1992, J. Chem. Inf. Comput. Sci..

[29]  P. Verhoest,et al.  Moving beyond rules: the development of a central nervous system multiparameter optimization (CNS MPO) approach to enable alignment of druglike properties. , 2010, ACS chemical neuroscience.

[30]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[31]  Simon J F Macdonald,et al.  Medicinal chemistry in drug discovery in big pharma: past, present and future. , 2017, Drug discovery today.

[32]  Kimito Funatsu,et al.  Development of R-Group Fingerprints Based on the Local Landscape from an Attachment Point of a Molecular Structure , 2019, J. Chem. Inf. Model..

[33]  Igor V. Tetko,et al.  Critical Assessment of QSAR Models of Environmental Toxicity against Tetrahymena pyriformis: Focusing on Applicability Domain and Overfitting by Variable Selection , 2008, J. Chem. Inf. Model..

[34]  Alexander G. Dossetter,et al.  A statistical analysis of in vitro human microsomal metabolic stability of small phenyl group substituents, leading to improved design sets for parallel SAR exploration of a chemical series. , 2010, Bioorganic & medicinal chemistry.

[35]  Tingjun Hou,et al.  Advances in computationally modeling human oral bioavailability. , 2015, Advanced drug delivery reviews.

[36]  M. Waring,et al.  A quantitative assessment of hERG liability as a function of lipophilicity. , 2007, Bioorganic & medicinal chemistry letters.

[37]  Andrew G. Leach,et al.  Matched molecular pairs as a guide in the optimization of pharmaceutical properties; a study of aqueous solubility, plasma protein binding and oral exposure. , 2006, Journal of medicinal chemistry.

[38]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[39]  A. Balaban Highly discriminating distance-based topological index , 1982 .

[40]  David L. Mobley,et al.  Measuring experimental cyclohexane-water distribution coefficients for the SAMPL5 challenge , 2016, bioRxiv.

[41]  Thorsten Meinl,et al.  KNIME - the Konstanz information miner: version 2.0 and beyond , 2009, SKDD.

[42]  S. Planey,et al.  The influence of lipophilicity in drug discovery and design , 2012, Expert opinion on drug discovery.

[43]  Andrew G. Leach,et al.  Matched molecular pair analysis in drug discovery. , 2013, Drug discovery today.

[44]  Daniel J. Warner,et al.  Matched molecular pairs as a medicinal chemistry tool. , 2011, Journal of medicinal chemistry.

[45]  Robert P. Sheridan,et al.  Molecular Transformations as a Way of Finding and Exploiting Consistent Local QSAR , 2006, J. Chem. Inf. Model..

[46]  Dong-Sheng Cao,et al.  In silico evaluation of logD7.4 and comparison with other prediction methods , 2015 .

[47]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery. 4. Prediction of Aqueous Solubility Based on Atom Contribution Approach , 2004, J. Chem. Inf. Model..

[48]  I. Tetko,et al.  Matched Molecular Pair Analysis on Large Melting Point Datasets: A Big Data Perspective , 2017, ChemMedChem.

[49]  Gábor Csányi,et al.  Gaussian Processes: A Method for Automatic QSAR Modeling of ADME Properties , 2007, J. Chem. Inf. Model..

[50]  J. Gasteiger,et al.  ITERATIVE PARTIAL EQUALIZATION OF ORBITAL ELECTRONEGATIVITY – A RAPID ACCESS TO ATOMIC CHARGES , 1980 .

[51]  Dong-Sheng Cao,et al.  A new strategy of outlier detection for QSAR/QSPR , 2009, J. Comput. Chem..

[52]  M. Waring Lipophilicity in drug discovery , 2010, Expert Opinion on Drug Discovery.

[53]  Paul R. Gerber,et al.  Charge distribution from a simple molecular orbital type calculation and non-bonding interaction terms in the force field MAB , 1998, J. Comput. Aided Mol. Des..

[54]  Igor V. Tetko,et al.  Online chemical modeling environment (OCHEM): web platform for data storage, model development and publishing of chemical information , 2011, J. Comput. Aided Mol. Des..

[55]  Jonas Boström,et al.  Using Matched Molecular Series as a Predictive Tool To Optimize Biological Activity , 2014, Journal of medicinal chemistry.

[56]  Lars Carlsson,et al.  Beyond the Scope of Free-Wilson Analysis: Building Interpretable QSAR Models with Machine Learning Algorithms , 2013, J. Chem. Inf. Model..

[57]  Kazuto Yamazaki,et al.  Computational prediction of the plasma protein-binding percent of diverse pharmaceutical compounds. , 2004, Journal of pharmaceutical sciences.

[58]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery, 6. Can Oral Bioavailability in Humans Be Effectively Predicted by Simple Molecular Property-Based Rules? , 2007, J. Chem. Inf. Model..

[59]  M. Gilson,et al.  Public domain databases for medicinal chemistry. , 2012, Journal of medicinal chemistry.

[60]  L. Hall,et al.  The Molecular Connectivity Chi Indexes and Kappa Shape Indexes in Structure‐Property Modeling , 2007 .

[61]  Gordon M. Crippen,et al.  Prediction of Physicochemical Parameters by Atomic Contributions , 1999, J. Chem. Inf. Comput. Sci..

[62]  Yizeng Liang,et al.  A novel tree kernel support vector machine classifier for modeling the relationship between bioactivity and molecular descriptors , 2013 .

[63]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[64]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[65]  Julie Clark,et al.  Shared Consensus Machine Learning Models for Predicting Blood Stage Malaria Inhibition , 2017, J. Chem. Inf. Model..

[66]  Tudor I. Oprea,et al.  Property distribution of drug-related chemical databases* , 2000, J. Comput. Aided Mol. Des..

[67]  Tingjun Hou,et al.  ADME Evaluation in Drug Discovery. 5. Correlation of Caco-2 Permeation with Simple Molecular Properties , 2004, J. Chem. Inf. Model..