Universal Graph Transformer Self-Attention Networks.

The transformer self-attention network has been extensively used in research domains such as computer vision, image processing, and natural language processing. But it has not been actively used in graph neural networks (GNNs) where constructing an advanced aggregation function is essential. To this end, we present U2GNN, an effective GNN model leveraging a transformer self-attention mechanism followed by a recurrent transition, to induce a powerful aggregation function to learn graph representations. Experimental results show that the proposed U2GNN achieves state-of-the-art accuracies on well-known benchmark datasets for graph classification. Our code is available at: https://github.com/daiquocnguyen/Graph-Transformer

[1]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[2]  Zhi-Li Zhang,et al.  Graph Capsule Convolutional Neural Networks , 2018, ArXiv.

[3]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[4]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Yizhou Sun,et al.  Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification , 2019, ArXiv.

[6]  Alex Smola,et al.  Kernel methods in machine learning , 2007, math/0701907.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Maoguo Gong,et al.  Structured self-attention architecture for graph-level representation learning , 2020, Pattern Recognit..

[9]  Pietro Liò,et al.  Towards Sparse Hierarchical Graph Classifiers , 2018, ArXiv.

[10]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[11]  Younjoo Seo,et al.  Discriminative structural graph classification , 2019, ArXiv.

[12]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[13]  Jiawei Zhang,et al.  Graph-Bert: Only Attention is Needed for Learning Graph Representations , 2020, ArXiv.

[14]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[15]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[16]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[21]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[22]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[23]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[24]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[25]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[26]  Hannu Toivonen,et al.  Statistical evaluation of the predictive toxicology challenge , 2000 .

[27]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[28]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[31]  Lihui Chen,et al.  Capsule Graph Neural Network , 2018, ICLR.

[32]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[33]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[35]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[36]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[37]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[38]  Jaewoo Kang,et al.  Graph Transformer Networks , 2019, NeurIPS.

[39]  Yanfang Ye,et al.  Heterogeneous Graph Attention Network , 2019, WWW.

[40]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[41]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[42]  Sergey Ivanov,et al.  Anonymous Walk Embeddings , 2018, ICML.

[43]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[44]  Yaron Lipman,et al.  Invariant and Equivariant Graph Networks , 2018, ICLR.

[45]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[46]  Ruochi Zhang,et al.  Hyper-SAGNN: a self-attention based graph neural network for hypergraphs , 2019, ICLR.

[47]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..