Unsupervised Universal Self-Attention Network for Graph Classification

Existing graph embedding models often have weaknesses in exploiting graph structure similarities, potential dependencies among nodes and global network properties. To this end, we present U2GAN, a novel unsupervised model leveraging on the strength of the recently introduced universal self-attention network (Dehghani et al., 2019), to learn low-dimensional embeddings of graphs which can be used for graph classification. In particular, given an input graph, U2GAN first applies a self-attention computation, which is then followed by a recurrent transition to iteratively memorize its attention on vector representations of each node and its neighbors across each iteration. Thus, U2GAN can address the weaknesses in the existing models in order to produce plausible node embeddings whose sum is the final embedding of the whole graph. Experimental results show that our unsupervised U2GAN produces new state-of-the-art performances on a range of well-known benchmark datasets for the graph classification task. It even outperforms supervised methods in most of benchmark cases.

[1]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[2]  Ashwin Srinivasan,et al.  Statistical Evaluation of the Predictive Toxicology Challenge 2000-2001 , 2003, Bioinform..

[3]  Yizhou Sun,et al.  Are Powerful Graph Neural Nets Necessary? A Dissection on Graph Classification , 2019, ArXiv.

[4]  Zhi-Li Zhang,et al.  Graph Capsule Convolutional Neural Networks , 2018, ArXiv.

[5]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[6]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[7]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[8]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[9]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[10]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[11]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[12]  Yoshua Bengio,et al.  On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[13]  Nils M. Kriege,et al.  A survey on graph kernels , 2019, Applied Network Science.

[14]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[15]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[16]  Lihui Chen,et al.  Capsule Graph Neural Network , 2018, ICLR.

[17]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[18]  Sergey Ivanov,et al.  Anonymous Walk Embeddings , 2018, ICML.

[19]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[20]  Yaron Lipman,et al.  Invariant and Equivariant Graph Networks , 2018, ICLR.

[21]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[22]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[23]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[24]  Chengqi Zhang,et al.  Network Representation Learning: A Survey , 2017, IEEE Transactions on Big Data.

[25]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[26]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Hannu Toivonen,et al.  Statistical evaluation of the predictive toxicology challenge , 2000 .

[29]  Michalis Vazirgiannis,et al.  Graph Kernels: A Survey , 2019, J. Artif. Intell. Res..

[30]  Yang Liu,et al.  graph2vec: Learning Distributed Representations of Graphs , 2017, ArXiv.

[31]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[32]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[33]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.

[34]  Younjoo Seo,et al.  Discriminative structural graph classification , 2019, ArXiv.

[35]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[36]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[37]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[38]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[40]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[41]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[42]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[43]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[44]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[45]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .