WideDTA: prediction of drug-target binding affinity

Motivation: Prediction of the interaction affinity between proteins and compounds is a major challenge in the drug discovery process. WideDTA is a deep-learning based prediction model that employs chemical and biological textual sequence information to predict binding affinity. Results: WideDTA uses four text-based information sources, namely the protein sequence, ligand SMILES, protein domains and motifs, and maximum common substructure words to predict binding affinity. WideDTA outperformed one of the state of the art deep learning methods for drug-target binding affinity prediction, DeepDTA on the KIBA dataset with a statistical significance. This indicates that the word-based sequence representation adapted by WideDTA is a promising alternative to the character-based sequence representation approach in deep learning models for binding affinity prediction, such as the one used in DeepDTA. In addition, the results showed that, given the protein sequence and ligand SMILES, the inclusion of protein domain and motif information as well as ligand maximum common substructure words do not provide additional useful information for the deep learning model. Interestingly, however, using only domain and motif information to represent proteins achieved similar performance to using the full protein sequence, suggesting that important binding relevant information is contained within the protein motifs and domains.

[1]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[2]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[3]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[4]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein–ligand binding affinity prediction , 2017, Bioinform..

[5]  Tao Xu,et al.  Making Sense of Large-Scale Kinase Inhibitor Bioactivity Data Sets: A Comparative and Integrative Analysis , 2014, J. Chem. Inf. Model..

[6]  Shuigeng Zhou,et al.  Boosting compound-protein interaction prediction by deep learning , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[7]  M. Gonen,et al.  Concordance probability and discriminatory power in proportional hazards regression , 2005 .

[8]  Noel M. O'Boyle,et al.  DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures , 2018 .

[9]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[10]  Yanli Wang,et al.  PubChem BioAssay: 2017 update , 2016, Nucleic Acids Res..

[11]  Ivan G. Costa,et al.  A multiple kernel learning algorithm for drug-target interaction prediction , 2016, BMC Bioinformatics.

[12]  K. Chou,et al.  Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features , 2010, PloS one.

[13]  Marta M. Stepniewska-Dziubinska,et al.  Development and evaluation of a deep learning model for protein-ligand binding affinity prediction , 2017, 1712.07042.

[14]  Olac Fuentes,et al.  DLSCORE: A Deep Learning Model for Predicting Protein-Ligand Binding Affinities , 2018 .

[15]  Yong Zhou,et al.  A Computational-Based Method for Predicting Drug-Target Interactions by Using Stacked Autoencoder Deep Neural Network , 2017, J. Comput. Biol..

[16]  Yanli Wang,et al.  PubChem: Integrated Platform of Small Molecules and Biological Activities , 2008 .

[17]  Ehsaneddin Asgari,et al.  Continuous Distributed Representation of Biological Sequences for Deep Proteomics and Genomics , 2015, PloS one.

[18]  Vijay S. Pande,et al.  Atomic Convolutional Networks for Predicting Protein-Ligand Binding Affinity , 2017, ArXiv.

[19]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[20]  Yoshihiro Yamanishi,et al.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework , 2010, Bioinform..

[21]  Arzucan Özgür,et al.  A novel methodology on distributed representations of proteins using their interacting ligands , 2018, Bioinform..

[22]  Andreas Mayr,et al.  Deep Learning as an Opportunity in Virtual Screening , 2015 .

[23]  Artem Cherkasov,et al.  SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines , 2017, Journal of Cheminformatics.

[24]  Mindy I. Davis,et al.  Comprehensive analysis of kinase inhibitor selectivity , 2011, Nature Biotechnology.

[25]  Maciej Eder,et al.  Linguistic measures of chemical diversity and the “keywords” of molecular collections , 2018, Scientific Reports.

[26]  Arzucan Özgür,et al.  DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..

[27]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[28]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[29]  Hojung Nam,et al.  SELF-BLM: Prediction of drug-target interactions via self-training SVM , 2017, PloS one.

[30]  Chunyan Miao,et al.  Neighborhood Regularized Logistic Matrix Factorization for Drug-Target Interaction Prediction , 2016, PLoS Comput. Biol..

[31]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[32]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[33]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[34]  Anshul Kundaje,et al.  Prediction of protein-ligand interactions from paired protein sequence motifs and ligand substructures , 2018, PSB.

[35]  Artem Cherkasov,et al.  PADME: A Deep Learning-based Framework for Drug-Target Interaction Prediction , 2018, ArXiv.

[36]  Ping Zhang,et al.  Interpretable Drug Target Prediction Using Deep Neural Representation , 2018, IJCAI.