Deep Graph Attention Model

Graph classification is a problem with practical applications in many different domains. Most of the existing methods take the entire graph into account when calculating graph features. In a graphlet-based approach, for instance, the entire graph is processed to get the total count of different graphlets or sub-graphs. In the real-world, however, graphs can be both large and noisy with discriminative patterns confined to certain regions in the graph only. In this work, we study the problem of attentional processing for graph classification. The use of attention allows us to focus on small but informative parts of the graph, avoiding noise in the rest of the graph. We present a novel RNN model, called the Graph Attention Model (GAM), that processes only a portion of the graph by adaptively selecting a sequence of "interesting" nodes. The model is equipped with an external memory component which allows it to integrate information gathered from different parts of the graph. We demonstrate the effectiveness of the model through various experiments.

[1]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[2]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[3]  Richard Evans,et al.  Deep Reinforcement Learning in Large Discrete Action Spaces , 2015, 1512.07679.

[4]  Teresa M. Przytycka,et al.  Chapter 5: Network Biology Approach to Complex Diseases , 2012, PLoS Comput. Biol..

[5]  Philip S. Yu,et al.  Dual active feature and sample selection for graph classification , 2011, KDD.

[6]  Yanhua Li,et al.  Planning Bike Lanes based on Sharing-Bikes' Trajectories , 2017, KDD.

[7]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[8]  Alán Aspuru-Guzik,et al.  Convolutional Networks on Graphs for Learning Molecular Fingerprints , 2015, NIPS.

[9]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[10]  Kurt Mehlhorn,et al.  Efficient graphlet kernels for large graph comparison , 2009, AISTATS.

[11]  Le Song,et al.  GRAM: Graph-based Attention Model for Healthcare Representation Learning , 2016, KDD.

[12]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[13]  Ryan A. Rossi,et al.  Estimation of Graphlet Statistics , 2017, ArXiv.

[14]  Oladimeji Farri,et al.  Condensed Memory Networks for Clinical Diagnostic Inferencing , 2016, AAAI.

[15]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[16]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining for Malware Detection , 2013 .

[17]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[18]  Risi Kondor,et al.  The Multiscale Laplacian Graph Kernel , 2016, NIPS.

[19]  Michalis Vazirgiannis,et al.  Matching Node Embeddings for Graph Similarity , 2017, AAAI.

[20]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[21]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[22]  Wei Xu,et al.  ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering , 2015, ArXiv.

[23]  David Rogers,et al.  Extended-Connectivity Fingerprints , 2010, J. Chem. Inf. Model..

[24]  David Weininger,et al.  SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules , 1988, J. Chem. Inf. Comput. Sci..

[25]  S. V. N. Vishwanathan,et al.  Graph kernels , 2007 .

[26]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[27]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[28]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[29]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.