Towards Explanation for Unsupervised Graph-Level Representation Learning

Due to the superior performance of Graph Neural Networks (GNNs) in various domains, there is an increasing interest in the GNN explanation problem " which fraction of the input graph is the most crucial to decide the model’s decision? " Existing explanation methods focus on the supervised settings, e.g. , node classification and graph classification, while the explanation for unsupervised graph-level representation learning is still unexplored. The opaqueness of the graph representations may lead to unexpected risks when deployed for high-stake decision-making scenarios. In this paper, we advance the Information Bottleneck principle (IB) to tackle the proposed explanation problem for unsupervised graph representations, which leads to a novel principle, Unsupervised Subgraph Information Bottleneck (USIB). We also theoretically analyze the connection between graph representations and explanatory subgraphs on the label space, which reveals that the expressiveness and robustness of representations benefit the fidelity of explanatory subgraphs. Experimental results on both synthetic and real-world datasets demonstrate the superiority of our developed explainer and the validity of our theoretical analysis. ABSTRACT Data artifacts incentivize machine learning models to learn non-transferable generalizations by taking advantage of shortcuts in the data, and there is growing evidence that data artifacts play a role for the strong results that deep learning models achieve in recent natural language processing benchmarks. In this paper, we focus on task-oriented dialogue and investigate whether popular datasets such as MultiWOZ contain such data artifacts. We found that by only keeping frequent phrases in the training examples, state-of-the-art models perform similarly compared to the variant trained with full data, suggesting they exploit these spurious correlations to solve the task. Motivated by this, we propose a contrastive learning based framework to encourage the model to ignore these cues and focus on learning generalisable patterns. We also experiment with adversarial filtering to remove “easy” training instances so that the model would focus on learning from the “harder” instances. We conduct a number of generalization experiments — e.g., cross-domain/dataset and adversarial tests — to assess the robustness of our approach and found that it works exceptionally well.

[1]  Roger Wattenhofer,et al.  When Comparing to Ground Truth is Wrong: On Evaluating GNN Explanation Methods , 2021, KDD.

[2]  Baolin Peng,et al.  Soloist: Building Task Bots at Scale with Transfer Learning and Machine Teaching , 2021, Transactions of the Association for Computational Linguistics.

[3]  Jennifer Neville,et al.  Adversarial Graph Augmentation to Improve Graph Contrastive Learning , 2021, NeurIPS.

[4]  Danqi Chen,et al.  SimCSE: Simple Contrastive Learning of Sentence Embeddings , 2021, EMNLP.

[5]  Junzhou Huang,et al.  Recognizing Predictive Substructures With Subgraph Information Bottleneck , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Shuiwang Ji,et al.  On Explainability of Graph Neural Networks via Subgraph Explorations , 2021, ICML.

[7]  M. de Rijke,et al.  CF-GNNExplainer: Counterfactual Explanations for Graph Neural Networks , 2021, AISTATS.

[8]  Shuiwang Ji,et al.  Explainability in Graph Neural Networks: A Taxonomic Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Hongguang Li,et al.  Robustness Testing of Language Understanding in Task-Oriented Dialog , 2020, ACL.

[10]  Jianfeng Gao,et al.  RADDLE: An Evaluation Benchmark and Analysis Platform for Robust Task-oriented Dialog Systems , 2020, ACL.

[11]  Yonatan Belinkov,et al.  Learning from others' mistakes: Avoiding dataset biases without modeling them , 2020, ICLR.

[12]  Bo Zong,et al.  Parameterized Explainer for Graph Neural Network , 2020, NeurIPS.

[13]  Jure Leskovec,et al.  Graph Information Bottleneck , 2020, NeurIPS.

[14]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[15]  Jie Wang,et al.  Line Graph Neural Networks for Link Prediction , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  My T. Thai,et al.  PGM-Explainer: Probabilistic Graphical Model Explanations for Graph Neural Networks , 2020, NeurIPS.

[17]  Jatin Ganhotra,et al.  Effects of Naturalistic Variation in Goal-Oriented Dialog , 2020, FINDINGS.

[18]  Kristian Kersting,et al.  TUDataset: A collection of benchmark datasets for learning with graphs , 2020, ArXiv.

[19]  Shuiwang Ji,et al.  Towards Deeper Graph Neural Networks , 2020, KDD.

[20]  Jindong Chen,et al.  MultiWOZ 2.2 : A Dialogue Dataset with Additional Annotation Corrections and State Tracking Baselines , 2020, NLP4CONVAI.

[21]  Bing Bai,et al.  Adversarial Infidelity Learning for Model Interpretation , 2020, KDD.

[22]  Zhiwu Lu,et al.  Counterfactual VQA: A Cause-Effect Look at Language Bias , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Shinichi Nakajima,et al.  Higher-Order Explanations of Graph Neural Networks via Relevant Walks , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Shuiwang Ji,et al.  XGNN: Towards Model-Level Explanations of Graph Neural Networks , 2020, KDD.

[25]  R. Socher,et al.  A Simple Language Model for Task-Oriented Dialogue , 2020, Neural Information Processing Systems.

[26]  Doug Downey,et al.  Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks , 2020, ACL.

[27]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[28]  Xiao Xu,et al.  Dynamic Fusion Network for Multi-Domain End-to-end Task-Oriented Dialog , 2020, ACL.

[29]  Minnan Luo,et al.  Scalable attack on graph data by injecting vicious nodes , 2020, Data Mining and Knowledge Discovery.

[30]  Richard Socher,et al.  TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue , 2020, EMNLP.

[31]  Zeynep Akata,et al.  Learning Robust Representations via Multi-View Information Bottleneck , 2020, ICLR.

[32]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[33]  Hanwang Zhang,et al.  Two Causal Principles for Improving Visual Dialog , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Zhijian Ou,et al.  Task-Oriented Dialog Systems that Consider Multiple Appropriate Responses under the Same Context , 2019, AAAI.

[35]  Jianfeng Gao,et al.  DIALOGPT : Large-Scale Generative Pre-training for Conversational Response Generation , 2019, ACL.

[36]  Walter Karlen,et al.  CXPlain: Causal Explanations for Model Interpretation under Uncertainty , 2019, NeurIPS.

[37]  Raghav Gupta,et al.  Towards Scalable Multi-domain Conversational Agents: The Schema-Guided Dialogue Dataset , 2019, AAAI.

[38]  Haohan Wang,et al.  Unlearn Dataset Bias in Natural Language Inference by Fitting the Residual , 2019, EMNLP.

[39]  Yoav Goldberg,et al.  Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets , 2019, EMNLP.

[40]  Jian Tang,et al.  InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization , 2019, ICLR.

[41]  Ronan Le Bras,et al.  WinoGrande , 2019, AAAI.

[42]  Hung-Yu Kao,et al.  Probing Neural Network Comprehension of Natural Language Arguments , 2019, ACL.

[43]  Dawn Song,et al.  Natural Adversarial Examples , 2019, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yonatan Belinkov,et al.  Don’t Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference , 2019, ACL.

[45]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[46]  Heiko Hoffmann,et al.  Explainability Methods for Graph Convolutional Neural Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Wenhu Chen,et al.  Semantically Conditioned Dialog Response Generation via Hierarchical Disentangled Self-Attention , 2019, ACL.

[48]  Richard Socher,et al.  Transferable Multi-Domain State Generator for Task-Oriented Dialogue Systems , 2019, ACL.

[49]  Aleksander Madry,et al.  Adversarial Examples Are Not Bugs, They Are Features , 2019, NeurIPS.

[50]  Hossein Azizpour,et al.  Explainability Techniques for Graph Convolutional Networks , 2019, ICML 2019.

[51]  J. Leskovec,et al.  GNNExplainer: Generating Explanations for Graph Neural Networks , 2019, NeurIPS.

[52]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[53]  Jingyuan Zhang,et al.  Knowledge Graph Embedding Based Question Answering , 2019, WSDM.

[54]  Thomas Wolf,et al.  TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.

[55]  Richard Socher,et al.  Global-to-local Memory Pointer Networks for Task-Oriented Dialogue , 2019, ICLR.

[56]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[57]  Stefan Ultes,et al.  MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling , 2018, EMNLP.

[58]  R. Devon Hjelm,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[59]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[60]  Min-Yen Kan,et al.  Sequicity: Simplifying Task-oriented Dialogue Systems with Single Sequence-to-Sequence Architectures , 2018, ACL.

[61]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[62]  Le Song,et al.  Adversarial Attack on Graph Structured Data , 2018, ICML.

[63]  Stephan Günnemann,et al.  Adversarial Attacks on Neural Networks for Graph Data , 2018, KDD.

[64]  Pascale Fung,et al.  Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.

[65]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[66]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[67]  Kam-Fai Wong,et al.  Integrating planning for task-completion dialogue policy learning , 2018, ACL.

[68]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[69]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[70]  Christopher D. Manning,et al.  Key-Value Retrieval Networks for Task-Oriented Dialogue , 2017, SIGDIAL Conference.

[71]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[72]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[73]  Ming-Wei Chang,et al.  A Knowledge-Grounded Neural Conversation Model , 2017, AAAI.

[74]  Wei-Ying Ma,et al.  Hierarchical Recurrent Attention Network for Response Generation , 2017, AAAI.

[75]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[76]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[77]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[78]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[79]  Gökhan Tür,et al.  End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding , 2016, INTERSPEECH.

[80]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[81]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[82]  Jason Weston,et al.  Learning End-to-End Goal-Oriented Dialog , 2016, ICLR.

[83]  Peng Wang,et al.  Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge from External Sources , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[85]  Naftali Tishby,et al.  Deep learning and the information bottleneck principle , 2015, 2015 IEEE Information Theory Workshop (ITW).

[86]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[87]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[88]  Andrzej Rucinski,et al.  Random Graphs , 2018, Foundations of Data Science.

[89]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[90]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[91]  Xiangnan He,et al.  Towards Multi-Grained Explainability for Graph Neural Networks , 2021, NeurIPS.

[92]  António Branco,et al.  Shortcutted Commonsense: Data Spuriousness in Deep Learning of Commonsense Reasoning , 2021, EMNLP.

[93]  Dilek Z. Hakkani-Tür,et al.  MultiWOZ 2.1: Multi-Domain Dialogue State Corrections and State Tracking Baselines , 2019, ArXiv.

[94]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[95]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .