Molecule Attention Transformer

Designing a single neural network architecture that performs competitively across a range of molecule property prediction tasks remains largely an open challenge, and its solution may unlock a widespread use of deep learning in the drug discovery industry. To move towards this goal, we propose Molecule Attention Transformer (MAT). Our key innovation is to augment the attention mechanism in Transformer using inter-atomic distances and the molecular graph structure. Experiments show that MAT performs competitively on a diverse set of molecular prediction tasks. Most importantly, with a simple self-supervised pretraining, MAT requires tuning of only a few hyperparameter values to achieve state-of-the-art performance on downstream tasks. Finally, we show that attention weights learned by MAT are interpretable from the chemical point of view.

[1]  Regina Barzilay,et al.  Analyzing Learned Molecular Representations for Property Prediction , 2019, J. Chem. Inf. Model..

[2]  Pengfei Chen,et al.  Alchemy: A Quantum Chemistry Dataset for Benchmarking AI Models , 2019, ArXiv.

[3]  Junzhou Huang,et al.  SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction , 2019, BCB.

[4]  Vijay S. Pande,et al.  Molecular graph convolutions: moving beyond fingerprints , 2016, Journal of Computer-Aided Molecular Design.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[7]  Chi Heem Wong,et al.  Estimation of clinical trial success rates and related parameters , 2018, Biostatistics.

[8]  Ruoyu Li,et al.  Adaptive Graph Convolutional Neural Networks , 2018, AAAI.

[9]  Igor V. Tetko,et al.  A Transformer Model for Retrosynthesis , 2019, ICANN.

[10]  Joseph Gomes,et al.  MoleculeNet: a benchmark for molecular machine learning† †Electronic supplementary information (ESI) available. See DOI: 10.1039/c7sc02664a , 2017, Chemical science.

[11]  I. Choi,et al.  Enhanced Deep‐Learning Prediction of Molecular Properties via Augmentation of Bond Topology , 2019, ChemMedChem.

[12]  Shion Honda,et al.  SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery , 2019, ArXiv.

[13]  Izhar Wallach,et al.  AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery , 2015, ArXiv.

[14]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[15]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.

[16]  Ting Liu,et al.  Gaussian Transformer: A Lightweight Approach for Natural Language Inference , 2019, AAAI.

[17]  Dermot Diamond,et al.  A survey on Big Data and Machine Learning for Chemistry , 2019, ArXiv.

[18]  Sean Ekins,et al.  Comparison of Deep Learning With Multiple Machine Learning Methods and Metrics Using Diverse Drug Discovery Data Sets. , 2017, Molecular pharmaceutics.

[19]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[20]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[21]  Evan Bolton,et al.  PubChem 2019 update: improved access to chemical data , 2018, Nucleic Acids Res..

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Jure Leskovec,et al.  Pre-training Graph Neural Networks , 2019, ArXiv.

[24]  Katsuhiko Ishiguro,et al.  Graph Warp Module: an Auxiliary Module for Boosting the Power of Graph Neural Networks , 2019, ArXiv.

[25]  Insung S. Choi,et al.  Enhanced Deep‐Learning Prediction of Molecular Properties via Augmentation of Bond Topology , 2018, ChemMedChem.

[26]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[27]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[28]  Zhen-Hua Ling,et al.  Neural Natural Language Inference Models Enhanced with External Knowledge , 2017, ACL.

[29]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[30]  Bonggun Shin,et al.  Self-Attention Based Molecule Representation for Predicting Drug-Target Interaction , 2019, MLHC.

[31]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[32]  Abhinav Vishnu,et al.  Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models , 2017, ArXiv.

[33]  Ashish Vaswani,et al.  Self-Attention with Relative Position Representations , 2018, NAACL.

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Wojciech Czarnecki,et al.  Learning to SMILE(S) , 2016, ArXiv.

[36]  Jinfeng Yi,et al.  Edge Attention-based Multi-Relational Graph Convolutional Networks , 2018, ArXiv.

[37]  Regina Barzilay,et al.  Convolutional Embedding of Attributed Molecular Graphs for Physical Property Prediction , 2017, J. Chem. Inf. Model..

[38]  F. Lombardo,et al.  Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings. , 2001, Advanced drug delivery reviews.

[39]  Shikha Bordia,et al.  Do Attention Heads in BERT Track Syntactic Dependencies? , 2019, ArXiv.

[40]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[41]  Sabina Podlewska,et al.  MetStabOn—Online Platform for Metabolic Stability Predictions , 2018, International journal of molecular sciences.

[42]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[43]  Yang Li,et al.  PotentialNet for Molecular Property Prediction , 2018, ACS central science.

[44]  Alexandre Tkatchenko,et al.  Quantum-chemical insights from deep tensor neural networks , 2016, Nature Communications.

[45]  Thierry Kogej,et al.  Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks , 2017, ACS central science.

[46]  Deng Cai,et al.  Learning Graph-Level Representation for Drug Discovery , 2017, ArXiv.

[47]  Alán Aspuru-Guzik,et al.  Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules , 2016, ACS central science.

[48]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Mojtaba Haghighatlari,et al.  Advances of machine learning in molecular modeling and simulation , 2019, Current Opinion in Chemical Engineering.

[50]  Hugo Ceulemans,et al.  Large-scale comparison of machine learning methods for drug target prediction on ChEMBL , 2018, Chemical science.