QSAR-derived affinity fingerprints (part 2): modeling performance for potency prediction

Affinity fingerprints report the activity of small molecules across a set of assays, and thus permit to gather information about the bioactivities of structurally dissimilar compounds, where models based on chemical structure alone are often limited, and model complex biological endpoints, such as human toxicity and in vitro cancer cell line sensitivity. Here, we propose to model in vitro compound activity using computationally predicted bioactivity profiles as compound descriptors. To this aim, we apply and validate a framework for the calculation of QSAR-derived affinity fingerprints (QAFFP) using a set of 1360 QSAR models generated using K i , K d , IC 50 and EC 50 data from ChEMBL database. QAFFP thus represent a method to encode and relate compounds on the basis of their similarity in bioactivity space. To benchmark the predictive power of QAFFP we assembled IC 50 data from ChEMBL database for 18 diverse cancer cell lines widely used in preclinical drug discovery, and 25 diverse protein target data sets. This study complements part 1 where the performance of QAFFP in similarity searching, scaffold hopping, and bioactivity classification is evaluated. Despite being inherently noisy, we show that using QAFFP as descriptors leads to errors in prediction on the test set in the ~ 0.65–0.95 pIC 50 units range, which are comparable to the estimated uncertainty of bioactivity data in ChEMBL (0.76–1.00 pIC 50 units). We find that the predictive power of QAFFP is slightly worse than that of Morgan2 fingerprints and 1D and 2D physicochemical descriptors, with an effect size in the 0.02–0.08 pIC 50 units range. Including QSAR models with low predictive power in the generation of QAFFP does not lead to improved predictive power. Given that the QSAR models we used to compute the QAFFP were selected on the basis of data availability alone, we anticipate better modeling results for QAFFP generated using more diverse and biologically meaningful targets. Data sets and Python code are publicly available at https://github.com/isidroc/QAFFP_regression .

[1]  Andreas Bender,et al.  Neighbours of cancer-related proteins have key influence on pathogenesis and could increase the drug target space for anticancer therapies , 2017, npj Systems Biology and Applications.

[2]  Eric J. Martin,et al.  Profile-QSAR: A Novel meta-QSAR Method that Combines Activities across the Kinase Family To Accurately Predict Affinity, Selectivity, and Cellular Activity , 2011, J. Chem. Inf. Model..

[3]  Robert P. Sheridan,et al.  Three Useful Dimensions for Domain Applicability in QSAR Models Using Random Forest , 2012, J. Chem. Inf. Model..

[4]  N. Cox,et al.  Clinical drug response can be predicted using baseline gene expression levels and in vitro drug sensitivity in cell lines , 2014, Genome Biology.

[5]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[6]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[7]  Benjamin Haibe-Kains,et al.  Inconsistency in large pharmacogenomic studies , 2013, Nature.

[8]  Julio Saez-Rodriguez,et al.  Looking beyond the cancer cell for effective drug combinations , 2016, Genome Medicine.

[9]  N. Curtin PARP inhibitors for anticancer therapy. , 2014, Biochemical Society transactions.

[10]  Kian Behbakht,et al.  PARP inhibitors: Clinical utility and possibilities of overcoming resistance. , 2017, Gynecologic oncology.

[11]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[12]  J. Mason Use of Biological Fingerprints Versus Structure/Chemotypes to Describe Molecules , 2010 .

[13]  Scott D. Kahn,et al.  Current Status of Methods for Defining the Applicability Domain of (Quantitative) Structure-Activity Relationships , 2005, Alternatives to laboratory animals : ATLA.

[14]  Andreas Bender,et al.  In Silico Target Predictions: Defining a Benchmarking Data Set and Comparison of Performance of the Multiclass Naïve Bayes and Parzen-Rosenblatt Window , 2013, J. Chem. Inf. Model..

[15]  A. Bruna,et al.  Machine learning models to predict in vivo drug response via optimal dimensionality reduction of tumour molecular profiles , 2018, bioRxiv.

[16]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[17]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[18]  Isidro Cortes-Ciriano,et al.  Deep Confidence: A Computationally Efficient Framework for Calculating Reliable Errors for Deep Neural Networks , 2018, Journal of chemical information and modeling.

[19]  Phelim Bradley,et al.  Corrigendum: Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis , 2016, Nature Communications.

[20]  Isidro Cortes-Ciriano,et al.  Reliable Prediction Errors for Deep Neural Networks Using Test-Time Dropout , 2019, J. Chem. Inf. Model..

[21]  D. Svozil,et al.  QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping , 2020, Journal of Cheminformatics.

[22]  Ruili Huang,et al.  Modelling the Tox21 10 K chemical profiles for in vivo toxicity prediction and mechanism characterization , 2016, Nature Communications.

[23]  Alexander Tropsha,et al.  Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research , 2010, J. Chem. Inf. Model..

[24]  Isidro Cortes-Ciriano,et al.  Improved large-scale prediction of growth inhibition patterns using the NCI60 cancer cell line panel , 2015, Bioinform..

[25]  Robert P. Sheridan,et al.  Using Random Forest To Model the Domain Applicability of Another Random Forest Model , 2013, J. Chem. Inf. Model..

[26]  Michael P. Morrissey,et al.  Pharmacogenomic agreement between two cancer cell line data sets , 2015, Nature.

[27]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[28]  Anatoly G Artemenko,et al.  Interpretation of QSAR Models Based on Random Forest Methods , 2011, Molecular informatics.

[29]  Gerta Rücker,et al.  y-Randomization and Its Variants in QSPR/QSAR , 2007, J. Chem. Inf. Model..

[30]  P. Sorger,et al.  Growth rate inhibition metrics correct for confounders in measuring sensitivity to cancer drugs , 2016, Nature Methods.

[31]  Gergely Zahoránszky-Köhalmi,et al.  Drug Effect Prediction by Polypharmacology-Based Interaction Profiling , 2012, J. Chem. Inf. Model..

[32]  Isidro Cortes-Ciriano,et al.  Comparing the Influence of Simulated Experimental Errors on 12 Machine Learning Algorithms in Bioactivity Modeling Using 12 Diverse Data Sets , 2015, J. Chem. Inf. Model..

[33]  U. Lessel,et al.  In vitro and in silico affinity fingerprints: Finding similarities beyond structural classes , 2000 .

[34]  Isidro Cortes-Ciriano,et al.  Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects , 2015 .

[35]  Csaba Hetényi,et al.  Contribution of 2D and 3D Structural Features of Drug Molecules in the Prediction of Drug Profile Matching , 2012, J. Chem. Inf. Model..

[36]  Karsten M. Borgwardt,et al.  Prediction of human population responses to toxic compounds by a collaborative competition , 2015, Nature Biotechnology.

[37]  G. Konecny,et al.  PARP inhibitors for BRCA1/2-mutated and sporadic ovarian cancer: current practice and future directions , 2016, British Journal of Cancer.

[38]  Isidro Cortés-Ciriano,et al.  Detecting the mutational signature of homologous recombination deficiency in clinical samples , 2019, Nature Genetics.

[39]  Robert L. Mason,et al.  Statistical Principles in Experimental Design , 2003 .

[40]  Benito Munoz,et al.  Identification of cancer cytotoxic modulators of PDE3A by predictive chemogenomics , 2015, Nature chemical biology.

[41]  Andreas Bender,et al.  How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space , 2009, J. Chem. Inf. Model..

[42]  Roger A. Sayle,et al.  Comparing structural fingerprints using a literature-based similarity benchmark , 2016, Journal of Cheminformatics.

[43]  Julio Saez-Rodriguez,et al.  Machine Learning Prediction of Cancer Cell Sensitivity to Drugs Based on Genomic and Chemical Properties , 2012, PloS one.

[44]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[45]  Scott Boyer,et al.  Introducing Conformal Prediction in Predictive Modeling. A Transparent and Flexible Alternative to Applicability Domain Determination , 2014, J. Chem. Inf. Model..

[46]  Xiang-Wei Zhu,et al.  All-Assay-Max2 pQSAR: Activity predictions as accurate as 4-concentration IC50s for 8,558 Novartis assays , 2019 .

[47]  V. Poroikov,et al.  PASS: identification of probable targets and mechanisms of toxicity , 2007, SAR and QSAR in environmental research.

[48]  Jürgen Bajorath,et al.  Exploring activity cliffs in medicinal chemistry. , 2012, Journal of medicinal chemistry.

[49]  J. Dearden,et al.  QSAR modeling: where have you been? Where are you going to? , 2014, Journal of medicinal chemistry.

[50]  Hans Briem,et al.  Flexsim-X: A Method for the Detection of Molecules with Similar Biological Activity , 2000, J. Chem. Inf. Comput. Sci..

[51]  Andreas Bender,et al.  "Bayes Affinity Fingerprints" Improve Retrieval Rates in Virtual Screening and Define Orthogonal Bioactivity Space: When Are Multitarget Drugs a Feasible Concept? , 2006, J. Chem. Inf. Model..

[52]  B. J. Winer Statistical Principles in Experimental Design , 1992 .

[53]  Eric J. Martin,et al.  Profile-QSAR 2.0: Kinase Virtual Screening Accuracy Comparable to Four-Concentration IC50s for Realistically Novel Compounds , 2017, J. Chem. Inf. Model..

[54]  Pedro J. Ballester,et al.  Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data , 2018, bioRxiv.

[55]  George Papadatos,et al.  Want Drugs? Use Python , 2016, ArXiv.

[56]  M. Toulmonde,et al.  A review of PARP inhibitors: from bench to bedside. , 2011, Annals of oncology : official journal of the European Society for Medical Oncology.

[57]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[58]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[59]  M. Taron,et al.  Pharmacogenomic biomarkers for personalized cancer treatment , 2015, Journal of internal medicine.

[60]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[61]  Amir K. Foroushani,et al.  Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen , 2019, Nature Communications.

[62]  David M. Rocke,et al.  Predicting ligand binding to proteins by affinity fingerprinting. , 1995, Chemistry & biology.

[63]  Meir Glick,et al.  Prediction of Biological Targets for Compounds Using Multiple-Category Bayesian Models Trained on Chemogenomics Databases , 2006, J. Chem. Inf. Model..

[64]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[65]  Andreas Bender,et al.  How Consistent are Publicly Reported Cytotoxicity Data? Large‐Scale Statistical Analysis of the Concordance of Public Independent Cytotoxicity Measurements , 2016, ChemMedChem.

[66]  Sven Kosub,et al.  A note on the triangle inequality for the Jaccard distance , 2016, Pattern Recognit. Lett..

[67]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[68]  Isidro Cortes-Ciriano,et al.  Improved Chemical Structure-Activity Modeling Through Data Augmentation , 2015, J. Chem. Inf. Model..

[69]  Mohammad Fallahi-Sichani,et al.  Metrics other than potency reveal systematic variation in responses to cancer drugs. , 2013, Nature chemical biology.

[70]  Kunal Roy,et al.  Selected Statistical Methods in QSAR , 2015 .

[71]  Robert D Clark,et al.  Neighborhood behavior: a useful concept for validation of "molecular diversity" descriptors. , 1996, Journal of medicinal chemistry.

[72]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[73]  Péter Hári,et al.  Virtual Affinity Fingerprints for Target Fishing: A New Application of Drug Profile Matching , 2013, J. Chem. Inf. Model..

[74]  CHUN WEI YAP,et al.  PaDEL‐descriptor: An open source software to calculate molecular descriptors and fingerprints , 2011, J. Comput. Chem..

[75]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[76]  A. Fliri,et al.  Biological spectra analysis: Linking biological activity profiles to molecular structure. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[77]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[78]  Andreas Bender,et al.  Modelling of compound combination effects and applications to efficacy and toxicity: state-of-the-art, challenges and perspectives. , 2016, Drug discovery today.

[79]  Andreas Bender,et al.  KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images , 2018, Journal of Cheminformatics.

[80]  Yanli Wang,et al.  Identifying Compound-Target Associations by Combining Bioactivity Profile Similarity Search and Public Databases Mining , 2011, J. Chem. Inf. Model..

[81]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[82]  Benjamin Haibe-Kains,et al.  Revisiting inconsistency in large pharmacogenomic studies , 2015, bioRxiv.

[83]  Andreas Bender,et al.  How Diverse Are Diversity Assessment Methods? A Comparative Analysis and Benchmarking of Molecular Descriptor Space , 2014, J. Chem. Inf. Model..

[84]  A. Vulpetti,et al.  Comparability of Mixed IC50 Data – A Statistical Analysis , 2013, PloS one.

[85]  Andreas Bender,et al.  From in silico target prediction to multi-target drug design: current databases, methods and applications. , 2011, Journal of proteomics.