Profiling Prediction of Kinase Inhibitors: Toward the Virtual Assay.

Kinome-wide screening would have the advantage of providing structure-activity relationships against hundreds of targets simultaneously. Here, we report the generation of ligand-based activity prediction models for over 280 kinases by employing Machine Learning methods on an extensive data set of proprietary bioactivity data combined with open data. High quality (AUC > 0.7) was achieved for ∼200 kinases by (1) combining open with proprietary data, (2) choosing Random Forest over alternative tested Machine Learning methods, and (3) balancing the training data sets. Tests on left-out and external data indicate a high value for virtual screening projects. Importantly, the derived models are evenly distributed across the kinome tree, allowing reliable profiling prediction for all kinase branches. The prediction quality was further improved by employing experimental bioactivity fingerprints of a small kinase subset. Overall, the generated models can support various hit identification tasks, including virtual screening, compound repurposing, and the detection of potential off-targets.

[1]  Nathalie Japkowicz,et al.  The Class Imbalance Problem: Significance and Strategies , 2000 .

[2]  D. Manallack,et al.  Selecting Screening Candidates for Kinase and G Protein‐Coupled Receptor Targets Using Neural Networks. , 2002 .

[3]  P. Cohen Protein kinases — the major drug targets of the twenty-first century? , 2002, Nature reviews. Drug discovery.

[4]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[5]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[8]  L. Wodicka,et al.  A small molecule–kinase interaction map for clinical kinase inhibitors , 2005, Nature Biotechnology.

[9]  Michael J. Keiser,et al.  Relating protein pharmacology by ligand chemistry , 2007, Nature Biotechnology.

[10]  P. Zarrinkar,et al.  High-throughput kinase profiling as a platform for drug discovery , 2008, Nature Reviews Drug Discovery.

[11]  Mindy I. Davis,et al.  A quantitative analysis of kinase inhibitor selectivity , 2008, Nature Biotechnology.

[12]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[13]  S. Knapp,et al.  The (un)targeted cancer kinome. , 2010, Nature chemical biology.

[14]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[15]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[16]  Theonie Anastassiadis,et al.  Comprehensive assay of kinase catalytic activity reveals features of kinase inhibitor selectivity , 2011, Nature biotechnology.

[17]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[18]  H. Yabuuchi,et al.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules , 2011, Molecular systems biology.

[19]  Jay Yagnik,et al.  The power of comparative reasoning , 2011, 2011 International Conference on Computer Vision.

[20]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[21]  P. Hajduk,et al.  Navigating the kinome. , 2011, Nature chemical biology.

[22]  Eric J. Martin,et al.  Profile-QSAR: A Novel meta-QSAR Method that Combines Activities across the Kinase Family To Accurately Predict Affinity, Selectivity, and Cellular Activity , 2011, J. Chem. Inf. Model..

[23]  Peter S. Kutchukian,et al.  Rethinking molecular similarity: comparing compounds on the basis of biological activity. , 2012, ACS chemical biology.

[24]  Bin Chen,et al.  Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions , 2012, J. Chem. Inf. Model..

[25]  George Karypis,et al.  Improved Machine Learning Models for Predicting Selective Compounds , 2012, J. Chem. Inf. Model..

[26]  R. Daly,et al.  Targeting the human kinome for cancer therapy: current perspectives. , 2012, Critical reviews in oncogenesis.

[27]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[28]  Michael J. Keiser,et al.  Large Scale Prediction and Testing of Drug Activity on Side-Effect Targets , 2012, Nature.

[29]  Sereina Riniker,et al.  Heterogeneous Classifier Fusion for Ligand-Based Virtual Screening: Or, How Decision Making by Committee Can Be a Good Thing , 2013, J. Chem. Inf. Model..

[30]  Andreas Bender,et al.  Are phylogenetic trees suitable for chemogenomics analyses of bioactivity data sets: the importance of shared active compounds and choosing a suitable data embedding method, as exemplified on Kinases , 2013, Journal of Cheminformatics.

[31]  Stephan C. Schürer,et al.  Kinome-wide Activity Modeling from Diverse Public High-Quality Data Sets , 2013, J. Chem. Inf. Model..

[32]  Dario R Alessi,et al.  Kinase drug discovery--what's next in the field? , 2013, ACS chemical biology.

[33]  R. Guha,et al.  Profile of the GSK Published Protein Kinase Inhibitor Set Across ATP-Dependent and-Independent Luciferases: Implications for Reporter-Gene Assays , 2013, PloS one.

[34]  Martin Augustin,et al.  A broad activity screen in support of a chemogenomic map for kinase signalling research and drug discovery. , 2013, The Biochemical journal.

[35]  Julian Blagg,et al.  A public-private partnership to unlock the untargeted kinome. , 2013, Nature chemical biology.

[36]  A. Vulpetti,et al.  Comparability of Mixed IC50 Data – A Statistical Analysis , 2013, PloS one.

[37]  Tao Xu,et al.  Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis , 2014, J. Chem. Inf. Model..

[38]  M. Helmer-Citterich,et al.  Computational methods for analysis and inference of kinase/inhibitor relationships , 2014, Front. Genet..

[39]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[40]  D. Fabbro,et al.  Advances in kinase targeting: current clinical use and clinical trials. , 2014, Trends in pharmacological sciences.

[41]  Marc C. Nicklaus,et al.  QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem , 2014, J. Chem. Inf. Model..

[42]  Anne Mai Wassermann,et al.  A screening pattern recognition method finds new and divergent targets for drugs and natural products. , 2014, ACS chemical biology.

[43]  George Papadatos,et al.  The ChEMBL bioactivity database: an update , 2013, Nucleic Acids Res..

[44]  S. Knapp,et al.  Exploration of Type II Binding Mode: A Privileged Approach for Kinase Inhibitor Focused Drug Discovery? , 2014, ACS chemical biology.

[45]  Yuan Wang,et al.  Using Information from Historical High-Throughput Screens to Predict Active Compounds , 2014, J. Chem. Inf. Model..

[46]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[47]  Andreas Mayr,et al.  Deep Learning as an Opportunity in Virtual Screening , 2015 .

[48]  N. Heinrich,et al.  Computational Chemistry in the Pharmaceutical Industry: From Childhood to Adolescence , 2015, ChemMedChem.

[49]  P. Prusis,et al.  Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects , 2015 .

[50]  Friedrich Rippmann,et al.  Pocketome of Human Kinases: Prioritizing the ATP Binding Sites of (Yet) Untapped Protein Kinases for Drug Discovery , 2015, J. Chem. Inf. Model..

[51]  John P. Overington,et al.  Comprehensive characterization of the Published Kinase Inhibitor Set , 2016, Nature Biotechnology.

[52]  Alina Bora,et al.  Predictive Models for Fast and Effective Profiling of Kinase Inhibitors , 2016, J. Chem. Inf. Model..

[53]  M. Clausen,et al.  Small-molecule kinase inhibitors: an analysis of FDA-approved drugs. , 2016, Drug discovery today.

[54]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[55]  Álvaro Cortés Cabrera,et al.  Compound biological signatures facilitate phenotypic screening and target elucidation , 2016, bioRxiv.

[56]  Friedrich Rippmann,et al.  Identification and Visualization of Kinase-Specific Subpockets , 2016, J. Chem. Inf. Model..