Universal Self-Attention Network for Graph Classification

We consider a limitation in using graph neural networks (GNNs) for graph classification: the lack of mechanism to exploit dependencies among nodes, often due to the lack in efficiency of aggregating nodes' neighbors. To this end, we present U2GNN -- a novel embedding model leveraging the transformer self-attention network -- to learn plausible node and graph embeddings. In particular, our U2GNN induces a powerful aggregation function, using a self-attention mechanism followed by a recurrent transition, to update vector representation of each node from its neighbors. As a consequence, U2GNN effectively infer the potential dependencies among nodes, leading to better modeling of graph structures. Experimental results show that the proposed U2GNN achieves state-of-the-art accuracies on benchmark datasets for the graph classification task. Our code is available at: \url{this https URL}.

[1]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[2]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[3]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Jaewoo Kang,et al.  Graph Transformer Networks , 2019, NeurIPS.

[5]  Pietro Liò,et al.  Towards Sparse Hierarchical Graph Classifiers , 2018, ArXiv.

[6]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[7]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[8]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  Nils M. Kriege,et al.  A survey on graph kernels , 2019, Applied Network Science.

[11]  Zhi-Li Zhang,et al.  Graph Capsule Convolutional Neural Networks , 2018, ArXiv.

[12]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[13]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[14]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[15]  Ashwin Srinivasan,et al.  Statistical Evaluation of the Predictive Toxicology Challenge 2000-2001 , 2003, Bioinform..

[16]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[17]  Jaewoo Kang,et al.  Self-Attention Graph Pooling , 2019, ICML.

[18]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[19]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[20]  Yaron Lipman,et al.  Invariant and Equivariant Graph Networks , 2018, ICLR.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[23]  Yizhou Sun,et al.  Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification , 2019, ArXiv.

[24]  Michalis Vazirgiannis,et al.  Graph Kernels: A Survey , 2019, J. Artif. Intell. Res..

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[27]  Younjoo Seo,et al.  Discriminative structural graph classification , 2019, ArXiv.

[28]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[29]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[30]  Ruochi Zhang,et al.  Hyper-SAGNN: a self-attention based graph neural network for hypergraphs , 2019, ICLR.

[31]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[32]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[33]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[34]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[36]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[37]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[38]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[39]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[40]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[41]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[42]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  Maoguo Gong,et al.  Structured self-attention architecture for graph-level representation learning , 2020, Pattern Recognit..