Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery

Structure–activity relationship modelling is frequently used in the early stage of drug discovery to assess the activity of a compound on one or several targets, and can also be used to assess the interaction of compounds with liability targets. QSAR models have been used for these and related applications over many years, with good success. Conformal prediction is a relatively new QSAR approach that provides information on the certainty of a prediction, and so helps in decision-making. However, it is not always clear how best to make use of this additional information. In this article, we describe a case study that directly compares conformal prediction with traditional QSAR methods for large-scale predictions of target-ligand binding. The ChEMBL database was used to extract a data set comprising data from 550 human protein targets with different bioactivity profiles. For each target, a QSAR model and a conformal predictor were trained and their results compared. The models were then evaluated on new data published since the original models were built to simulate a “real world” application. The comparative study highlights the similarities between the two techniques but also some differences that it is important to bear in mind when the methods are used in practical drug discovery applications.

[1]  Scott Boyer,et al.  Conformal Prediction Classification of a Large Data Set of Environmental Chemicals from ToxCast and Tox21 Estrogen Receptor Assays. , 2016, Chemical research in toxicology.

[2]  M. Rask-Andersen,et al.  Trends in the exploitation of novel drug targets , 2011, Nature Reviews Drug Discovery.

[3]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[4]  Ola Spjuth,et al.  Conformal Regression for QSAR Modelling – Quantifying Prediction Uncertainty , 2018 .

[5]  Vladimir Vovk,et al.  Venn-Abers Predictors , 2012, UAI.

[6]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[7]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[8]  Vladimir Vovk,et al.  A tutorial on conformal prediction , 2007, J. Mach. Learn. Res..

[9]  Antonio Ferrer-Montiel,et al.  Physiology and pharmacology of the vanilloid receptor. , 2006, Current neuropharmacology.

[10]  M. Sanguinetti,et al.  hERG potassium channels and cardiac arrhythmia , 2006, Nature.

[11]  D. Wang,et al.  Effect of cetirizine, levocetirizine, and dextrocetirizine on histamine‐induced nasal response in healthy adult volunteers , 2001, Allergy.

[12]  K. Baumann,et al.  Chemoinformatic Classification Methods and their Applicability Domain , 2016, Molecular informatics.

[13]  J. J. Chen,et al.  Classification ensembles for unbalanced class sizes in predictive toxicology , 2005, SAR and QSAR in environmental research.

[14]  Ola Spjuth,et al.  Prediction of Metabolic Transformations using Cross Venn-ABERS Predictors , 2017, COPA.

[15]  M. Gilson,et al.  Public domain databases for medicinal chemistry. , 2012, Journal of medicinal chemistry.

[16]  W. Gasarch,et al.  The Book Review Column 1 Coverage Untyped Systems Simple Types Recursive Types Higher-order Systems General Impression 3 Organization, and Contents of the Book , 2022 .

[17]  Scott Boyer,et al.  Binary classification of imbalanced datasets using conformal prediction. , 2017, Journal of molecular graphics & modelling.

[18]  Andreas Bender,et al.  Modelling compound cytotoxicity using conformal prediction and PubChem HTS data. , 2017, Toxicology research.

[19]  Thierry Kogej,et al.  Venn-Abers predictors for improved compound iterative screening in drug discovery , 2018, COPA.

[20]  Marc C. Nicklaus,et al.  QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem , 2014, J. Chem. Inf. Model..

[21]  HyungChul Ryu,et al.  Pyridine C-region analogs of 2-(3-fluoro-4-methylsulfonylaminophenyl)propanamides as potent TRPV1 antagonists. , 2015, European journal of medicinal chemistry.

[22]  John P. Overington,et al.  How many drug targets are there? , 2006, Nature Reviews Drug Discovery.

[23]  Henrik Boström,et al.  Conformal Prediction Using Decision Trees , 2013, 2013 IEEE 13th International Conference on Data Mining.

[24]  Ulf Norinder,et al.  Predicting skin sensitizers with confidence - Using conformal prediction to determine applicability domain of GARD. , 2018, Toxicology in vitro : an international journal published in association with BIBRA.

[25]  Robert P. Sheridan,et al.  Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction , 2013, J. Chem. Inf. Model..

[26]  Alexander Golbraikh,et al.  QSAR Modeling of the Blood–Brain Barrier Permeability for Diverse Organic Compounds , 2008, Pharmaceutical Research.

[27]  Suhani J. Patel,et al.  Review of Existing QSAR/QSPR Models Developed for Properties Used in Hazardous Chemicals Classification System , 2012 .

[28]  Alex Alves Freitas,et al.  Coping with Unbalanced Class Data Sets in Oral Absorption Models , 2013, J. Chem. Inf. Model..

[29]  Andreas Bender,et al.  Maximizing gain in high-throughput screening using conformal prediction , 2018, Journal of Cheminformatics.

[30]  A. Griffin,et al.  Safety, Pharmacokinetics, and Pharmacodynamics Study in Healthy Subjects of Oral NEO6860, a Modality Selective Transient Receptor Potential Vanilloid Subtype 1 Antagonist. , 2017, The journal of pain : official journal of the American Pain Society.

[31]  Eugene N Muratov,et al.  Universal Approach for Structural Interpretation of QSAR/QSPR Models , 2013, Molecular informatics.

[32]  Anton Simeonov,et al.  Unexplored therapeutic opportunities in the human genome , 2018, Nature Reviews Drug Discovery.

[33]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[34]  Jitender Verma,et al.  3D-QSAR in drug design--a review. , 2010, Current topics in medicinal chemistry.

[35]  Wen-long Huang,et al.  Design, synthesis and biological evaluation of novel analgesic agents targeting both cyclooxygenase and TRPV1. , 2016, Bioorganic & medicinal chemistry.

[36]  Lars Carlsson,et al.  Using Venn-Abers predictors to assess cardio-vascular risk , 2018, COPA.

[37]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[38]  Ola Spjuth,et al.  Conformal Regression for Quantitative Structure-Activity Relationship Modeling - Quantifying Prediction Uncertainty , 2018, J. Chem. Inf. Model..

[39]  Vladimir Vovk,et al.  Large-scale probabilistic predictors with and without guarantees of validity , 2015, NIPS.

[40]  Ivan Rusyn,et al.  Predicting drug-induced hepatotoxicity using QSAR and toxicogenomics approaches. , 2011, Chemical research in toxicology.

[41]  Lars Carlsson,et al.  Applying Mondrian Cross-Conformal Prediction To Estimate Prediction Confidence on Large Imbalanced Bioactivity Data Sets. , 2017, Journal of chemical information and modeling.

[42]  James J. Chen,et al.  Class-imbalanced classifiers for high-dimensional data , 2013, Briefings Bioinform..

[43]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[46]  Scott Boyer,et al.  Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination , 2014, J. Chem. Inf. Model..

[47]  Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, UAI 2014, Quebec City, Quebec, Canada, July 23-27, 2014 , 2014, UAI.

[48]  M. Fielden,et al.  Development of a large-scale chemogenomics database to improve drug candidate selection and to understand mechanisms of chemical toxicity and action. , 2005, Journal of biotechnology.