MolTrans: Molecular Interaction Transformer for drug–target interaction prediction

MOTIVATION Drug target interaction (DTI) prediction is a foundational task for in-silico drug discovery, which is costly and time-consuming due to the need of experimental search over large drug compound space. Recent years have witnessed promising progress for deep learning in DTI predictions. However, the following challenges are still open: (1) existing molecular representation learning approaches ignore the sub-structural nature of DTI, thus produce results that are less accurate and difficult to explain; (2) existing methods focus on limited labeled data while ignoring the value of massive unlabelled molecular data. RESULTS We propose a Molecular Interaction Transformer (MolTrans) to address these limitations via: (1) knowledge inspired sub-structural pattern mining algorithm and interaction modeling module for more accurate and interpretable DTI prediction; (2) an augmented transformer encoder to better extract and capture the semantic relations among substructures extracted from massive unlabeled biomedical data. We evaluate MolTrans on real world data and show it improved DTI prediction performance compared to state-of-the-art baselines. AVAILABILITY The model scripts is available at https://github.com/kexinhuang12345/moltrans. CONTACT jimeng@illinois.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[2]  Hojung Nam,et al.  DeepConv-DTI: Prediction of drug-target interactions via deep learning with convolution on protein sequences , 2018, PLoS Comput. Biol..

[3]  John B. Shoven,et al.  I , Edinburgh Medical and Surgical Journal.

[4]  Jian Zhang,et al.  Natural Language Inference over Interaction Space , 2017, ICLR.

[5]  Ming Wen,et al.  Deep-Learning-Based Drug-Target Interaction Prediction. , 2017, Journal of proteome research.

[6]  Artem Cherkasov,et al.  SimBoost: a read-across approach for predicting drug–target binding affinities using gradient boosting machines , 2017, Journal of Cheminformatics.

[7]  I. Muchnik,et al.  Prediction of protein folding class using global description of amino acid sequence. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.

[10]  Tapio Pahikkala,et al.  Toward more realistic drug^target interaction predictions , 2014 .

[11]  Hao Ding,et al.  Collaborative matrix factorization with multiple similarities for predicting drug-target interactions , 2013, KDD.

[12]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[13]  Benjamin E. L. Lauffer,et al.  Cell Viability Cellular Histone Acetylation but Not Kinetic Rate Constants Correlate with Histone Deacetylase ( HDAC ) Inhibitor Cell Biology : , 2013 .

[14]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[15]  Ping Zhang,et al.  Interpretable Drug Target Prediction Using Deep Neural Representation , 2018, IJCAI.

[16]  J W LIGHTBOWN,et al.  Inhibition of cytochrome systems of heart muscle and certain bacteria by the antagonists of dihydrostreptomycin: 2-alkyl-4-hydroxyquinoline N-oxides. , 1956, The Biochemical journal.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[19]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[20]  J. Broach,et al.  High-throughput screening for drug discovery. , 1996, Nature.

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Arzucan Özgür,et al.  DeepDTA: deep drug–target binding affinity prediction , 2018, Bioinform..

[23]  David S. Wishart,et al.  DrugBank: a knowledgebase for drugs, drug actions and drug targets , 2007, Nucleic Acids Res..

[24]  Elif Ozkirimli,et al.  WideDTA: prediction of drug-target binding affinity , 2019, ArXiv.

[25]  Philip Gage,et al.  A new algorithm for data compression , 1994 .

[26]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[27]  이상헌,et al.  Deep Belief Networks , 2010, Encyclopedia of Machine Learning.

[28]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[29]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.