Structured self-attention architecture for graph-level representation learning

Abstract Recently, graph neural networks (GNNs) have shown to be effective in learning representative graph features. However, current pooling-based strategies for graph classification lack efficient utilization of graph representation information in which each node and layer have the same contribution to the output of graph-level representation. In this paper, we develop a novel architecture for extracting an effective graph representation by introducing structured multi-head self-attention in which the attention mechanism consists of three different forms, i.e., node-focused, layer-focused and graph-focused. In order to make full use of the information of graphs, the node-focused self-attention firstly aggregates neighbor node features with a scaled dot-product manner, and then the layer-focused and graph-focused self-attention serve as readout module to measure the importance of different nodes and layers to the model’s output. Moreover, it is able to improve the performance on graph classification tasks by combining these two self-attention mechanisms with base node-level GNNs. The proposed Structured Self-attention Architecture is evaluated on two kinds of graph benchmarks: bioinformatics datasets and social network datasets. Extensive experiments have demonstrated superior performance improvement to existing methods on predictive accuracy.

[1]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[2]  Jun Zhou,et al.  Data-Dependent Hashing Based on p-Stable Distribution , 2014, IEEE Transactions on Image Processing.

[3]  Edwin R. Hancock,et al.  Graph characteristics from the heat kernel trace , 2009, Pattern Recognit..

[4]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[5]  Huchuan Lu,et al.  Deep gated attention networks for large-scale street-level scene segmentation , 2019, Pattern Recognit..

[6]  Huchuan Lu,et al.  Multi attention module for visual tracking , 2019, Pattern Recognit..

[7]  Juan José Pantrigo,et al.  Convolutional Neural Networks and Long Short-Term Memory for skeleton-based human activity and hand gesture recognition , 2018, Pattern Recognit..

[8]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[9]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[10]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[11]  Edwin R. Hancock,et al.  Deep depth-based representations of graphs through deep learning networks , 2019, Neurocomputing.

[12]  P. Dobson,et al.  Distinguishing enzyme structures from non-enzymes without alignments. , 2003, Journal of molecular biology.

[13]  Edwin R. Hancock,et al.  Quantum kernels for unattributed graphs using discrete-time quantum walks , 2017, Pattern Recognit. Lett..

[14]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[15]  Bo Zong,et al.  Substructure Assembling Network for Graph Classification , 2018, AAAI.

[16]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[17]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[18]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[19]  Ting Liu,et al.  Recent advances in convolutional neural networks , 2015, Pattern Recognit..

[20]  S. V. N. Vishwanathan,et al.  A Structural Smoothing Framework For Robust Graph Comparison , 2015, NIPS.

[21]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[22]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[23]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[24]  Donald F. Towsley,et al.  Diffusion-Convolutional Neural Networks , 2015, NIPS.

[25]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[26]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[27]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[28]  Yixin Chen,et al.  An End-to-End Deep Learning Architecture for Graph Classification , 2018, AAAI.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[30]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[31]  Jun Zhou,et al.  Adaptive hash retrieval with kernel based similarity , 2018, Pattern Recognit..

[32]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[33]  Nils M. Kriege,et al.  Subgraph Matching Kernels for Attributed Graphs , 2012, ICML.

[34]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[35]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[36]  Mathias Niepert,et al.  Learning Convolutional Neural Networks for Graphs , 2016, ICML.

[37]  Edwin R. Hancock,et al.  A quantum Jensen-Shannon graph kernel for unattributed graphs , 2015, Pattern Recognit..

[38]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[39]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[40]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[41]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[42]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.