Graph-Aware Transformer: Is Attention All Graphs Need?

Graphs are the natural data structure to represent relational and structural information in many domains. To cover the broad range of graph-data applications including graph classification as well as graph generation, it is desirable to have a general and flexible model consisting of an encoder and a decoder that can handle graph data. Although the representative encoder-decoder model, Transformer, shows superior performance in various tasks especially of natural language processing, it is not immediately available for graphs due to their non-sequential characteristics. To tackle this incompatibility, we propose GRaph-Aware Transformer (GRAT), the first Transformer-based model which can encode and decode whole graphs in end-to-end fashion. GRAT is featured with a self-attention mechanism adaptive to the edge information and an auto-regressive decoding mechanism based on the two-path approach consisting of sub-graph encoding path and node-and-edge generation path for each decoding step. We empirically evaluated GRAT on multiple setups including encoder-based tasks such as molecule property predictions on QM9 datasets and encoder-decoder-based tasks such as molecule graph generation in the organic molecule synthesis domain. GRAT has shown very promising results including state-of-the-art performance on 4 regression tasks in QM9 benchmark.

[1]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[2]  Stefano Ermon,et al.  Graphite: Iterative Generative Modeling of Graphs , 2018, ICML.

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Alex Fout,et al.  Protein Interface Prediction using Graph Convolutional Networks , 2017, NIPS.

[5]  Christopher A. Hunter,et al.  Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction , 2018, ACS central science.

[6]  Connor W. Coley,et al.  A graph-convolutional neural network model for the prediction of chemical reactivity , 2018, Chemical science.

[7]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[8]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[9]  F. Scarselli,et al.  A new model for learning in graph domains , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[10]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[11]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[12]  Constantine Bekas,et al.  “Found in Translation”: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models† †Electronic supplementary information (ESI) available: Time-split test set and example predictions, together with attention weights, confidence and token probabilities. See DO , 2017, Chemical science.

[13]  Daniel W. Davies,et al.  Machine learning for molecular and materials science , 2018, Nature.

[14]  Boris Ginsburg,et al.  Mixed-Precision Training for NLP and Speech Recognition with OpenSeq2Seq , 2018, 1805.10387.

[15]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[16]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[17]  Peng Xu,et al.  Multigraph Transformer for Free-Hand Sketch Recognition , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Yinhai Wang,et al.  Traffic Graph Convolutional Recurrent Neural Network: A Deep Learning Framework for Network-Scale Traffic Learning and Forecasting , 2018, IEEE Transactions on Intelligent Transportation Systems.

[19]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[20]  James Henderson,et al.  Graph-to-Graph Transformer for Transition-based Dependency Parsing , 2019, FINDINGS.

[21]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[22]  Cao Xiao,et al.  Constrained Generation of Semantically Valid Graphs via Regularizing Variational Autoencoders , 2018, NeurIPS.

[23]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[25]  Dina Katabi,et al.  Circuit-GNN: Graph Neural Networks for Distributed Circuit Design , 2019, ICML.

[26]  Jaewoo Kang,et al.  Graph Transformer Networks , 2019, NeurIPS.

[27]  Svetha Venkatesh,et al.  Graph Transformation Policy Network for Chemical Reaction Prediction , 2018, KDD.

[28]  Yujia Li,et al.  Learning the Graphical Structure of Electronic Health Records with Graph Convolutional Transformer , 2020, AAAI.

[29]  Yuan He,et al.  Graph Neural Networks for Social Recommendation , 2019, WWW.

[30]  Regina Barzilay,et al.  Junction Tree Variational Autoencoder for Molecular Graph Generation , 2018, ICML.

[31]  Jure Leskovec,et al.  Modeling polypharmacy side effects with graph convolutional networks , 2018, bioRxiv.

[32]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[33]  Max Welling,et al.  Graph Convolutional Matrix Completion , 2017, ArXiv.

[34]  Jure Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[35]  Regina Barzilay,et al.  Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network , 2017, NIPS.

[36]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Jacek Tabor,et al.  Molecule Attention Transformer , 2020, ArXiv.

[39]  Zhichun Wang,et al.  Cross-lingual Knowledge Graph Alignment via Graph Convolutional Networks , 2018, EMNLP.

[40]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[41]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[42]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[43]  Stephan Günnemann,et al.  Directional Message Passing for Molecular Graphs , 2020, ICLR.

[44]  Seokjun Seo,et al.  Hybrid Approach of Relation Network and Localized Graph Convolutional Filtering for Breast Cancer Subtype Classification , 2017, IJCAI.

[45]  Xiaojun Wan,et al.  AMR-To-Text Generation with Graph Transformer , 2020, TACL.

[46]  Mirella Lapata,et al.  Text Generation from Knowledge Graphs with Graph Transformers , 2019, NAACL.

[47]  T. Jaakkola,et al.  Hierarchical Graph-to-Graph Translation for Molecules , 2019 .

[48]  Xinlei Chen,et al.  Iterative Visual Reasoning Beyond Convolutions , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[50]  Yuji Matsumoto,et al.  Knowledge Transfer for Out-of-Knowledge-Base Entities: A Graph Neural Network Approach , 2017, ArXiv.

[51]  Nikos Komodakis,et al.  GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders , 2018, ICANN.

[52]  Jean-Louis Reymond,et al.  Enumeration of 166 Billion Organic Small Molecules in the Chemical Universe Database GDB-17 , 2012, J. Chem. Inf. Model..

[53]  Jure Leskovec,et al.  GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models , 2018, ICML.

[54]  Renjie Liao,et al.  Efficient Graph Generation with Graph Recurrent Attention Networks , 2019, NeurIPS.