Discovering Invariant Rationales for Graph Neural Networks

Intrinsic interpretability of graph neural networks (GNNs) is to find a small subset of the input graph's features -- rationale -- which guides the model prediction. Unfortunately, the leading rationalization models often rely on data biases, especially shortcut features, to compose rationales and make predictions without probing the critical and causal patterns. Moreover, such data biases easily change outside the training distribution. As a result, these models suffer from a huge drop in interpretability and predictive performance on out-of-distribution data. In this work, we propose a new strategy of discovering invariant rationale (DIR) to construct intrinsically interpretable GNNs. It conducts interventions on the training distribution to create multiple interventional distributions. Then it approaches the causal rationales that are invariant across different distributions while filtering out the spurious patterns that are unstable. Experiments on both synthetic and real-world datasets validate the superiority of our DIR in terms of interpretability and generalization ability on graph classification over the leading baselines. Code and datasets are available at https://github.com/Wuyxin/DIR-GNN.

[1]  Jianqiang Huang,et al.  Self-Supervised Learning Disentangled Group Representation as Feature , 2021, NeurIPS.

[2]  Qianru Sun,et al.  Causal Attention for Unbiased Visual Recognition , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  John Wright,et al.  ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction , 2021, ArXiv.

[4]  Jure Leskovec,et al.  OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs , 2021, NeurIPS Datasets and Benchmarks.

[5]  Bruno Ribeiro,et al.  Size-Invariant Graph Representations for Graph Classification Extrapolations , 2021, ICML.

[6]  Shuiwang Ji,et al.  On Explainability of Graph Neural Networks via Subgraph Explorations , 2021, ICML.

[7]  Danica J. Sutherland,et al.  Does Invariant Risk Minimization Capture Invariance? , 2021, AISTATS.

[8]  Shuiwang Ji,et al.  Explainability in Graph Neural Networks: A Taxonomic Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Bo Zong,et al.  Parameterized Explainer for Graph Neural Network , 2020, NeurIPS.

[10]  Kush R. Varshney,et al.  Empirical or Invariant Risk Minimization? A Sample Complexity Perspective , 2020, ICLR.

[11]  R. Zemel,et al.  Environment Inference for Invariant Learning , 2020, ICML.

[12]  Pradeep Ravikumar,et al.  The Risks of Invariant Risk Minimization , 2020, ICLR.

[13]  J. Leskovec,et al.  Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning , 2020, NeurIPS.

[14]  M. Bronstein,et al.  Improving Graph Neural Network Expressivity via Subgraph Isomorphism Counting , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Fei Chen,et al.  Risk Variance Penalization: From Distributional Robustness to Causality , 2020, ArXiv.

[16]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[17]  Tommi S. Jaakkola,et al.  Invariant Rationalization , 2020, ICML.

[18]  Aaron C. Courville,et al.  Out-of-Distribution Generalization via Risk Extrapolation (REx) , 2020, ICML.

[19]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[20]  Demis Hassabis,et al.  Improved protein structure prediction using potentials from deep learning , 2020, Nature.

[21]  Tatsunori B. Hashimoto,et al.  Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization , 2019, ArXiv.

[22]  P. Talukdar,et al.  ASAP: Adaptive Structure Aware Pooling for Learning Hierarchical Graph Representations , 2019, AAAI.

[23]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[24]  Matthieu Cord,et al.  RUBi: Reducing Unimodal Biases in Visual Question Answering , 2019, NeurIPS.

[25]  Joan Bruna,et al.  On the equivalence between graph isomorphism testing and function approximation with GNNs , 2019, NeurIPS.

[26]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[27]  Shuiwang Ji,et al.  Graph U-Nets , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Mohamed R. Amer,et al.  Understanding Attention and Generalization in Graph Neural Networks , 2019, NeurIPS.

[29]  Jaewoo Kang,et al.  Self-Attention Graph Pooling , 2019, ICML.

[30]  J. Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[31]  Lorenzo Livi,et al.  Graph Neural Networks With Convolutional ARMA Filters , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  P. Bühlmann,et al.  Invariance, Causality and Robustness , 2018, Statistical Science.

[33]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[34]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[35]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[36]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[37]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[38]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[39]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[40]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[41]  J. Pearl,et al.  Causal Inference in Statistics: A Primer , 2016 .

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[44]  Tyler J VanderWeele,et al.  A three-way decomposition of a total effect into direct, indirect, and interactive effects. , 2013, Epidemiology.

[45]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Jin Tian,et al.  A Characterization of Interventional Distributions in Semi-Markovian Causal Models , 2006, AAAI.

[47]  Vijay Prakash Dwivedi,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..

[48]  Xiangnan He,et al.  Towards Multi-Grained Explainability for Graph Neural Networks , 2021, NeurIPS.

[49]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[50]  Evan N. Feinberg,et al.  enchmark for molecular machine learning † , 2017 .

[51]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[52]  Causality : Models , Reasoning , and Inference , 2022 .