PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction

In silico drug-target interaction (DTI) prediction is an important and challenging problem in biomedical research with a huge potential benefit to the pharmaceutical industry and patients. Most existing methods for DTI prediction including deep learning models generally have binary endpoints, which could be an oversimplification of the problem, and those methods are typically unable to handle cold-target problems, i.e., problems involving target protein that never appeared in the training set. Towards this, we contrived PADME (Protein And Drug Molecule interaction prEdiction), a framework based on Deep Neural Networks, to predict real-valued interaction strength between compounds and proteins without requiring feature engineering. PADME takes both compound and protein information as inputs, so it is capable of solving cold-target (and cold-drug) problems. To our knowledge, we are the first to combine Molecular Graph Convolution (MGC) for compound featurization with protein descriptors for DTI prediction. We used multiple cross-validation split schemes and evaluation metrics to measure the performance of PADME on multiple datasets, including the ToxCast dataset, and PADME consistently dominates baseline methods. The results of a case study, which predicts the binding affinity between various compounds and androgen receptor (AR), suggest PADME's potential in drug development. The scalability of PADME is another advantage in the age of Big Data.

[1]  K. Baumann,et al.  Chemoinformatic Classification Methods and their Applicability Domain , 2016, Molecular informatics.

[2]  Kunal Roy,et al.  Validation of QSAR Models , 2015 .

[3]  George Papadatos,et al.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set , 2017, bioRxiv.

[4]  Vijay S. Pande,et al.  Massively Multitask Networks for Drug Discovery , 2015, ArXiv.

[5]  Gerard J. P. van Westen,et al.  Proteochemometric modeling as a tool to design selective compounds and for extrapolating to novel targets , 2011 .

[6]  P. Sharma,et al.  Targeting the androgen receptor in triple-negative breast cancer: current perspectives , 2017, OncoTargets and therapy.

[7]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  W. Marsden I and J , 2012 .

[10]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[11]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[12]  Paul Workman,et al.  Drug discovery in advanced prostate cancer: translating biology into therapy , 2016, Nature Reviews Drug Discovery.

[13]  Tie-Yan Liu,et al.  A Theoretical Analysis of Normalized Discounted Cumulative Gain (NDCG) Ranking Measures , 2013 .

[14]  Dingfeng Wu,et al.  The recent progress in proteochemometric modelling: focusing on target descriptors, cross‐term descriptors and application scope , 2017, Briefings Bioinform..

[15]  George Papadatos,et al.  ChEMBL web services: streamlining access to drug discovery data and utilities , 2015, Nucleic Acids Res..

[16]  Hao Ding,et al.  Similarity-based machine learning methods for predicting drug-target interactions: a brief review , 2014, Briefings Bioinform..

[17]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[18]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[19]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[20]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[21]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[22]  Artem Cherkasov,et al.  SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines , 2017, Journal of Cheminformatics.

[23]  Arzucan Özgür,et al.  DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..

[24]  J. Gearhart,et al.  In silico identification of protein targets for chemical neurotoxins using ToxCast in vitro data and read-across within the QSAR toolbox† †Electronic supplementary information (ESI) available: Tables S1–S4. See DOI: 10.1039/c7tx00268h , 2018, Toxicology research.

[25]  T. Hunter,et al.  The Protein Kinase Complement of the Human Genome , 2002, Science.

[26]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[27]  Thomas Unterthiner,et al.  Multi-Task Deep Networks for Drug Target Prediction , 2015 .

[28]  Josep Ginebra,et al.  pyGPGO: Bayesian Optimization for Python , 2017, J. Open Source Softw..

[29]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[30]  James G. Lyons,et al.  A feature extraction technique using bi-gram probabilities of position specific scoring matrix for protein fold recognition. , 2013, Journal of theoretical biology.

[31]  Ruili Huang,et al.  CERAPP: Collaborative Estrogen Receptor Activity Prediction Project , 2016, Environmental health perspectives.

[32]  Razelle Kurzrock,et al.  Androgen receptors beyond prostate cancer: an old marker as a new target , 2014, Oncotarget.

[33]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[34]  J S Smith,et al.  ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost , 2016, Chemical science.

[35]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[36]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[37]  Igor V. Tetko,et al.  Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set , 2010, J. Chem. Inf. Model..

[38]  Isidro Cortes-Ciriano,et al.  Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects , 2015 .

[39]  Navdeep Jaitly,et al.  Multi-task Neural Networks for QSAR Predictions , 2014, ArXiv.

[40]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[41]  Tao Xu,et al.  Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis , 2014, J. Chem. Inf. Model..

[42]  Dong-Sheng Cao,et al.  protr: Protein Sequence Feature Extraction with R , 2012 .

[43]  Sahand Khakabimamaghani,et al.  Drug-target interaction prediction from PSSM based evolutionary information. , 2016, Journal of pharmacological and toxicological methods.

[44]  Knut Baumann,et al.  Reliable estimation of prediction errors for QSAR models under model uncertainty using double cross-validation , 2014, Journal of Cheminformatics.

[45]  Yoshihiro Yamanishi,et al.  Prediction of drug–target interaction networks from the integration of chemical and genomic spaces , 2008, ISMB.

[46]  Peteris Prusis,et al.  Proteochemometrics: A Tool for Modeling the Molecular Interaction Space , 2005 .

[47]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[48]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[49]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[50]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[51]  George Papadatos,et al.  The ChEMBL database in 2017 , 2016, Nucleic Acids Res..

[52]  S. Hochreiter,et al.  DeepTox: Toxicity prediction using deep learning , 2017 .

[53]  P. Hajduk,et al.  Navigating the kinome. , 2011, Nature chemical biology.

[54]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[55]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[56]  Vijay S. Pande,et al.  Low Data Drug Discovery with One-Shot Learning , 2016, ACS central science.

[57]  David S. Wishart,et al.  DrugBank 5.0: a major update to the DrugBank database for 2018 , 2017, Nucleic Acids Res..

[58]  G Weber,et al.  Enzyme regulation. , 1967, Science.

[59]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[60]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[61]  Robert P. Sheridan,et al.  Deep Neural Nets as a Method for Quantitative Structure-Activity Relationships , 2015, J. Chem. Inf. Model..

[62]  Huixiao Hong,et al.  Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure. , 2015, Chemical research in toxicology.