Crowdsourced mapping of unexplored target space of kinase inhibitors

Despite decades of intensive search for compounds that modulate the activity of particular proteins, there are currently small-molecule probes available only for a small proportion of the human proteome. Effective approaches are therefore required to map the massive space of unexplored compound-target interactions for novel and potent activities. Here, we carried out a crowdsourced benchmarking of the accuracy of machine learning (ML) algorithms at predicting kinase inhibitor potencies across multiple kinase families. A total of 268 ML predictions were scored in unpublished bioactivity data sets. Top-performing algorithms used kernel learning, gradient boosting and deep learning, with predictive accuracy exceeding that of target activity assays. Subsequent experiments carried out based on the the top-performing model predictions demonstrated that these models and their ensemble can improve the accuracy of experimental mapping efforts, especially for so far under-studied kinases. The open-source ML algorithms together with the novel dose-response data for 905 bioactivities between 95 compounds and 295 kinases provide a unique resource for extending the druggable kinome.

[1]  Tapio Pahikkala,et al.  Fast Kronecker Product Kernel Methods via Generalized Vec Trick , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Elina Parri,et al.  Drug Target Commons: A Community Effort to Build a Consensus Knowledge Base for Drug-Target Interactions , 2017, Cell chemical biology.

[3]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[4]  Sonya M. Hanson,et al.  What Makes a Kinase Promiscuous for Inhibitors? , 2019, Cell chemical biology.

[5]  Avner Schlessinger,et al.  Multi-targeting Drug Community Challenge. , 2017, Cell chemical biology.

[6]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[7]  John P. Overington,et al.  Comprehensive characterization of the Published Kinase Inhibitor Set , 2016, Nature Biotechnology.

[8]  Tapio Pahikkala,et al.  RLScore: Regularized Least-Squares Learners , 2016, J. Mach. Learn. Res..

[9]  Adam A. Margolin,et al.  Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas , 2013, Nature Genetics.

[10]  R. Couñago,et al.  In depth analysis of kinase cross screening data to identify chemical starting points for inhibition of the Nek family of kinases. , 2018, MedChemComm.

[11]  A. Hopkins Network pharmacology: the next paradigm in drug discovery. , 2008, Nature chemical biology.

[12]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[13]  Kohske Takahashi,et al.  Welcome to the Tidyverse , 2019, J. Open Source Softw..

[14]  Pingzhao Hu,et al.  Predicting drug-target interaction network using deep learning model , 2019, Comput. Biol. Chem..

[15]  Jens Keilwagen,et al.  PRROC: computing and visualizing precision-recall and receiver operating characteristic curves in R , 2015, Bioinform..

[16]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[17]  S. Friend,et al.  Crowdsourcing biomedical research: leveraging communities as innovation engines , 2016, Nature Reviews Genetics.

[18]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[19]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[20]  Bin Li,et al.  Applications of machine learning in drug discovery and development , 2019, Nature Reviews Drug Discovery.

[21]  L. Wodicka,et al.  A small molecule–kinase interaction map for clinical kinase inhibitors , 2005, Nature Biotechnology.

[22]  Michael K. Gilson,et al.  BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology , 2015, Nucleic Acids Res..

[23]  Thomas Yu,et al.  Reducing overfitting in challenge-based competitions , 2016, 1607.00091.

[24]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[25]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[26]  Stephen J. Capuzzi,et al.  Progress towards a public chemogenomic set for protein kinases and a call for contributions , 2017, bioRxiv.

[27]  Di Wu,et al.  DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks , 2018, bioRxiv.

[28]  Gabor T. Marth,et al.  SSW Library: An SIMD Smith-Waterman C/C++ Library for Use in Genomic Applications , 2012, PloS one.

[29]  Justin Guinney,et al.  Alternative models for sharing confidential biomedical data , 2018, Nature Biotechnology.

[30]  John P. Overington,et al.  The promise and peril of chemical probes. , 2015, Nature chemical biology.

[31]  Karsten M. Borgwardt,et al.  Prediction of human population responses to toxic compounds by a collaborative competition , 2015, Nature Biotechnology.

[32]  Anton Simeonov,et al.  Unexplored therapeutic opportunities in the human genome , 2018, Nature Reviews Drug Discovery.

[33]  Rajarshi Guha,et al.  Pharos: Collating protein information to shed light on the druggable genome , 2016, Nucleic Acids Res..

[34]  Aurélien Garivier,et al.  On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models , 2014, J. Mach. Learn. Res..

[35]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .

[36]  J. Reymond,et al.  Exploring chemical space for drug discovery using the chemical universe database. , 2012, ACS chemical neuroscience.

[37]  Krister Wennerberg,et al.  The inconvenience of data of convenience: computational research beyond post-mortem analyses , 2017, Nature Methods.

[38]  Andrew R. Leach,et al.  ChEMBL: towards direct deposition of bioassay data , 2018, Nucleic Acids Res..

[39]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[40]  Peter Ertl,et al.  Cheminformatics Analysis of Organic Substituents: Identification of the Most Common Substituents, Calculation of Substituent Properties, and Automatic Identification of Drug-like Bioisosteric Groups , 2003, J. Chem. Inf. Comput. Sci..

[41]  Tudor I. Oprea,et al.  A comprehensive map of molecular drug targets , 2016, Nature Reviews Drug Discovery.

[42]  Juho Rousu,et al.  Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors , 2017, PLoS Comput. Biol..