DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks

Motivation Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy, and interpretability. Results We present a seamless integration of domain knowledges and learning-based approaches. Under novel representations of structurally-annotated protein sequences, a semi-supervised deep learning model that unifies recurrent and convolutional neural networks has been proposed to exploit both unlabeled and labeled data, for jointly encoding molecular representations and predicting affinities. Our representations and models outperform conventional options in achieving relative error in IC50 within 5-fold for test cases and 10-fold for protein classes not included for training. Performances for new protein classes with few labeled data are further improved by transfer learning. Furthermore, an attention mechanism is embedded to our model to add to its interpretability, as illustrated in case studies for predicting and explaining selective drug-target interactions. Availability https://github.com/Shen-Lab/DeepAffinity Contact yshen@tamu.edu Supplementary information Supplementary data are available at http://shen-lab.github.io/deep-affinity-bioinf18-supp.pdf.

[1]  Jianyang Zeng,et al.  Deep learning with feature embedding for compound-protein interaction prediction , 2016, bioRxiv.

[2]  Shuigeng Zhou,et al.  Boosting compound-protein interaction prediction by deep learning , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  Christian von Mering,et al.  STITCH: interaction networks of chemicals and proteins , 2007, Nucleic Acids Res..

[4]  B. Tidor,et al.  Rational Approaches to Improving Selectivity in Drug Design , 2012, Journal of medicinal chemistry.

[5]  Guo-Wei Wei,et al.  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions , 2017, PLoS Comput. Biol..

[6]  Xinhua Zhang,et al.  Protein-chemical Interaction Prediction via Kernelized Sparse Learning SVM , 2012, Pacific Symposium on Biocomputing.

[7]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[8]  Yasuo Tabei,et al.  Scalable prediction of compound-protein interactions using minwise hashing , 2013, BMC Systems Biology.

[9]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[10]  Wei Li,et al.  RaptorX-Property: a web server for protein structure property prediction , 2016, Nucleic Acids Res..

[11]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[12]  Robert D. Finn,et al.  HMMER web server: 2015 update , 2015, Nucleic Acids Res..

[13]  Robert Huber,et al.  X-ray Structure of Active Site-inhibited Clotting Factor Xa , 1996, The Journal of Biological Chemistry.

[14]  Karel Berka,et al.  PDBsum additions , 2013, Nucleic Acids Res..

[15]  Yongdong Zhang,et al.  Drug-target interaction prediction: databases, web servers and computational models , 2016, Briefings Bioinform..

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  Philip E. Bourne,et al.  Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model , 2010, PLoS Comput. Biol..

[18]  Phil Blunsom,et al.  Recurrent Continuous Translation Models , 2013, EMNLP.

[19]  M. Gilson,et al.  Calculation of protein-ligand binding affinities. , 2007, Annual review of biophysics and biomolecular structure.

[20]  Hui Liu,et al.  Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[21]  Tudor I. Oprea,et al.  A comprehensive map of molecular drug targets , 2016, Nature Reviews Drug Discovery.

[22]  Pierre Baldi,et al.  SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity , 2014, Bioinform..

[23]  Geoffrey S Ginsburg,et al.  Genomics-enabled drug repositioning and repurposing: insights from an IOM Roundtable activity. , 2014, JAMA.

[24]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[25]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[26]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[27]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[28]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[29]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[30]  Ole Hvilsted Olsen,et al.  Structure-based Design of a Low Molecular Weight, Nonphosphorus, Nonpeptide, and Highly Selective Inhibitor of Protein-tyrosine Phosphatase 1B* , 2000, The Journal of Biological Chemistry.

[31]  Yanli Wang,et al.  PubChem: a public information system for analyzing bioactivities of small molecules , 2009, Nucleic Acids Res..

[32]  Junzhou Huang,et al.  Seq2seq Fingerprint: An Unsupervised Deep Molecular Embedding for Drug Discovery , 2017, BCB.

[33]  Jian Peng,et al.  A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information , 2017, Nature Communications.

[34]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[35]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[36]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[37]  Pierre Baldi,et al.  SCRATCH: a protein structure and structural feature prediction server , 2005, Nucleic Acids Res..

[38]  Thomas S. Huang,et al.  Studying Very Low Resolution Recognition Using Deep Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Pedro J Ballester,et al.  Machine‐learning scoring functions to improve structure‐based binding affinity prediction and virtual screening , 2015, Wiley interdisciplinary reviews. Computational molecular science.

[40]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[41]  Yadi Zhou,et al.  Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. , 2012, Molecular bioSystems.

[42]  Hua Yu,et al.  A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data , 2012, PloS one.

[43]  Ping Zhang,et al.  Interpretable Drug Target Prediction Using Deep Neural Representation , 2018, IJCAI.

[44]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[46]  Yuhao Wang,et al.  Predicting drug-target interactions using restricted Boltzmann machines , 2013, Bioinform..

[47]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[48]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[49]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..