Graph Neural Networks Including Sparse Interpretability

Graph Neural Networks (GNNs) are versatile, powerful machine learning methods that enable graph structure and feature representation learning, and have applications across many domains. For applications critically requiring interpretation, attention-based GNNs have been leveraged. However, these approaches either rely on specific model architectures or lack a joint consideration of graph structure and node features in their interpretation. Here we present a model-agnostic framework for interpreting important graph structure and node features, Graph neural networks Including SparSe inTerpretability (GISST). With any GNN model, GISST combines an attention mechanism and sparsity regularization to yield an important subgraph and node feature subset related to any graph-based task. Through a single self-attention layer, a GISST model learns an importance probability for each node feature and edge in the input graph. By including these importance probabilities in the model loss function, the probabilities are optimized end-to-end and tied to the task-specific performance. Furthermore, GISST sparsifies these importance probabilities with entropy and L1 regularization to reduce noise in the input graph topology and node features. Our GISST models achieve superior node feature and edge explanation precision in synthetic datasets, as compared to alternative interpretation approaches. Moreover, our GISST models are able to identify important graph structure in real-world datasets. We demonstrate in theory that edge feature importance and multiple edge types can be considered by incorporating them into the GISST edge probability computation. By jointly accounting for topology, node features, and edge features, GISST inherently provides simple and relevant interpretations for any GNN models and tasks.

[1]  A. Barabasi,et al.  The human disease network , 2007, Proceedings of the National Academy of Sciences.

[2]  J. Kazius,et al.  Derivation and validation of toxicophores for mutagenicity prediction. , 2005, Journal of medicinal chemistry.

[3]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[4]  Been Kim,et al.  Sanity Checks for Saliency Maps , 2018, NeurIPS.

[5]  Albert-László Barabási,et al.  Network-based prediction of protein interactions , 2018, Nature Communications.

[6]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[7]  Pascal Vincent,et al.  Visualizing Higher-Layer Features of a Deep Network , 2009 .

[8]  Lorenzo Livi,et al.  Graph Neural Networks With Convolutional ARMA Filters , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Marinka Zitnik,et al.  Network Medicine Framework for Identifying Drug Repurposing Opportunities for COVID-19 , 2020, ArXiv.

[10]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[11]  Jure Leskovec,et al.  Modeling polypharmacy side effects with graph convolutional networks , 2018, bioRxiv.

[12]  Yuchen Zhang,et al.  L1-regularized Neural Networks are Improperly Learnable in Polynomial Time , 2015, ICML.

[13]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[14]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[15]  Mike Wu,et al.  Beyond Sparsity: Tree Regularization of Deep Models for Interpretability , 2017, AAAI.

[16]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[17]  Jure Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[18]  Jure Leskovec,et al.  Interpretable & Explorable Approximations of Black Box Models , 2017, ArXiv.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[21]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[22]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[23]  A. Debnath,et al.  Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity. , 1991, Journal of medicinal chemistry.

[24]  Weiqing Wang,et al.  Perturbation Biology: Inferring Signaling Networks in Cellular Systems , 2013, PLoS Comput. Biol..

[25]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[26]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[27]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[28]  Cynthia Rudin,et al.  This Looks Like That: Deep Learning for Interpretable Image Recognition , 2018 .

[29]  Eneldo Loza Mencía,et al.  DeepRED - Rule Extraction from Deep Neural Networks , 2016, DS.

[30]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[31]  R. Zemel,et al.  Neural Relational Inference for Interacting Systems , 2018, ICML.

[32]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[33]  Venkat Chandrasekaran,et al.  Complexity of Inference in Graphical Models , 2008, UAI.

[34]  Max Welling,et al.  Modeling Relational Data with Graph Convolutional Networks , 2017, ESWC.

[35]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[36]  T. Kathirvalavakumar,et al.  Reverse Engineering the Neural Networks for Rule Extraction in Classification Problems , 2011, Neural Processing Letters.

[37]  Jure Leskovec,et al.  Graph Convolutional Policy Network for Goal-Directed Molecular Graph Generation , 2018, NeurIPS.

[38]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[39]  Song Han,et al.  AMC: AutoML for Model Compression and Acceleration on Mobile Devices , 2018, ECCV.

[40]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[41]  M. DePamphilis,et al.  HUMAN DISEASE , 1957, The Ulster Medical Journal.