SAM-DTA: a sequence-agnostic model for drug-target binding affinity prediction

Drug-target binding affinity prediction is a fundamental task for drug discovery and has been studied for decades. Most methods follow the canonical paradigm that processes the inputs of the protein (target) and the ligand (drug) separately and then combines them together. In this study we demonstrate, surprisingly, that a model is able to achieve even superior performance without access to any protein-sequence-related information. Instead, a protein is characterized completely by the ligands that it interacts. Specifically, we treat different proteins separately, which are jointly trained in a multi-head manner, so as to learn a robust and universal representation of ligands that is generalizable across proteins. Empirical evidences show that the novel paradigm outperforms its competitive sequence-based counterpart, with the Mean Squared Error (MSE) of 0.4261 versus 0.7612 and the R-Square of 0.7984 versus 0.6570 compared with DeepAffinity. We also investigate the transfer learning scenario where unseen proteins are encountered after the initial training, and the cross-dataset evaluation for prospective studies. The results reveals the robustness of the proposed model in generalizing to unseen proteins as well as in predicting future data. Source codes and data are available at https://github.com/huzqatpku/SAM-DTA.

[1]  Hua Wu,et al.  HelixADMET: a robust and endpoint extensible ADMET system incorporating self-supervised knowledge transfer , 2022, Bioinform..

[2]  Lu Zhao,et al.  MGraphDTA: deep multiscale graph neural network for explainable drug–target binding affinity prediction , 2022, Chemical science.

[3]  Jike Wang,et al.  InteractionGraphNet: A Novel and Efficient Deep Graph Representation Learning Framework for Accurate Protein-Ligand Interaction Predictions. , 2021, Journal of medicinal chemistry.

[4]  K. Hernandez-Villafuerte,et al.  How Much Does It Cost to Research and Develop a New Drug? A Systematic Review and Assessment , 2021, PharmacoEconomics.

[5]  Oriol Vinyals,et al.  Highly accurate protein structure prediction with AlphaFold , 2021, Nature.

[6]  Aiping Lu,et al.  ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties , 2021, Nucleic Acids Res..

[7]  Yaohang Li,et al.  DeepDTAF: a deep learning method to predict protein-ligand binding affinity , 2021, Briefings Bioinform..

[8]  Michael L. Waskom,et al.  Seaborn: Statistical Data Visualization , 2021, J. Open Source Softw..

[9]  A S Rifaioglu,et al.  MDeePred: novel multi-channel protein featurization for deep learning-based binding affinity prediction in drug discovery , 2020, Bioinform..

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Mohamed R. Amer,et al.  Understanding Attention and Generalization in Graph Neural Networks , 2019, NeurIPS.

[12]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[13]  Hui Liu,et al.  Effectively Identifying Compound-Protein Interactions by Learning from Positive and Unlabeled Examples , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[14]  David S. Goodsell,et al.  RCSB Protein Data Bank: biological macromolecular structures enabling research and education in fundamental biology, biomedicine, biotechnology and energy , 2018, Nucleic Acids Res..

[15]  Di Wu,et al.  DeepAffinity: Interpretable Deep Learning of Compound-Protein Affinity through Unified Recurrent and Convolutional Neural Networks , 2018, bioRxiv.

[16]  Arzucan Özgür,et al.  DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..

[17]  Gianni De Fabritiis,et al.  KDEEP: Protein-Ligand Absolute Binding Affinity Prediction via 3D-Convolutional Neural Networks , 2018, J. Chem. Inf. Model..

[18]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein–ligand binding affinity prediction , 2017, Bioinform..

[19]  Juho Rousu,et al.  Computational-experimental approach to drug-target interaction mapping: A case study on kinase inhibitors , 2017, PLoS Comput. Biol..

[20]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[21]  Pei Zhou,et al.  HDOCK: a web server for protein–protein and protein–DNA/RNA docking based on a hybrid strategy , 2017, Nucleic Acids Res..

[22]  Artem Cherkasov,et al.  SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines , 2017, Journal of Cheminformatics.

[23]  Guo-Wei Wei,et al.  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions , 2017, PLoS Comput. Biol..

[24]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[25]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[26]  Ruth Huey,et al.  Computational protein–ligand docking and virtual drug screening with the AutoDock suite , 2016, Nature Protocols.

[27]  Günter Klambauer,et al.  DeepTox: Toxicity Prediction using Deep Learning , 2016, Front. Environ. Sci..

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[30]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[31]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[32]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[33]  Geoffrey S Ginsburg,et al.  Genomics-enabled drug repositioning and repurposing: insights from an IOM Roundtable activity. , 2014, JAMA.

[34]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[35]  Yasuo Tabei,et al.  Scalable prediction of compound-protein interactions using minwise hashing , 2013, BMC Systems Biology.

[36]  Chang Liu,et al.  Predicting Drug–Target Interactions Using Probabilistic Matrix Factorization , 2013, J. Chem. Inf. Model..

[37]  Yadi Zhou,et al.  Prediction of chemical-protein interactions: multitarget-QSAR versus computational chemogenomic methods. , 2012, Molecular bioSystems.

[38]  Hua Yu,et al.  A Systematic Prediction of Multiple Drug-Target Interactions from Chemical, Genomic, and Pharmacological Data , 2012, PloS one.

[39]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[40]  Joel Lexchin,et al.  The cost of drug development: a systematic review. , 2011, Health policy.

[41]  Philip E. Bourne,et al.  Drug Off-Target Effects Predicted Using Structural Analysis in the Context of a Metabolic Network Model , 2010, PLoS Comput. Biol..

[42]  Michael J. Keiser,et al.  Predicting new molecular targets for known drugs , 2009, Nature.

[43]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  M. Gilson,et al.  Calculation of protein-ligand binding affinities. , 2007, Annual review of biophysics and biomolecular structure.

[45]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[46]  Brian K Shoichet,et al.  Prediction of protein-ligand interactions. Docking and scoring: successes and gaps. , 2006, Journal of medicinal chemistry.

[47]  J. Berg,et al.  Molecular dynamics simulations of biomolecules , 2002, Nature Structural Biology.

[48]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[49]  S. Venkatesh,et al.  GraphDTA: prediction of drug–target binding affinity using graph convolutional networks , 2019 .

[50]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[51]  Oriol Vinyals,et al.  Order Matters: Sequence to sequence for sets , 2016, ICLR 2016.