Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals

Token-level attributions have been extensively studied to explain model predictions for a wide range of classification tasks in NLP (e.g., sentiment analysis), but such explanation techniques are less explored for machine reading comprehension (RC) tasks. Although the transformer-based models used here are identical to those used for classification, the underlying reasoning these models perform is very different and different types of explanations are required. We propose a methodology to evaluate explanations: an explanation should allow us to understand the RC model’s highlevel behavior with respect to a set of realistic counterfactual input scenarios. We define these counterfactuals for several RC settings, and by connecting explanation techniques’ outputs to high-level model behavior, we can evaluate how useful different explanations really are. Our analysis suggests that pairwise explanation techniques are better suited to RC than token-level attributions, which are often unfaithful in the scenarios we consider. We additionally propose an improvement to an attention-based attribution technique, resulting in explanations which better reveal the model’s behavior.1

[1]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[2]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[3]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[4]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[5]  Xiang Ren,et al.  Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models , 2020, ICLR.

[6]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[7]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[8]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[9]  Niranjan Balasubramanian,et al.  Is Multihop QA in DiRe Condition? Measuring and Reducing Disconnected Reasoning , 2020, EMNLP.

[10]  Ming-Wei Chang,et al.  BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[11]  Jacob Andreas,et al.  Compositional Explanations of Neurons , 2020, NeurIPS.

[12]  Yangfeng Ji,et al.  Generating Hierarchical Explanations on Text Classification via Feature Interaction Detection , 2020, ACL.

[13]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[14]  Ke Xu,et al.  Self-Attention Attribution: Interpreting Information Interactions Inside Transformer , 2020, AAAI.

[15]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[16]  David A. Wagner,et al.  Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples , 2018, ICML.

[17]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[18]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[19]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[20]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[21]  Quanshi Zhang,et al.  Towards a Deep and Unified Understanding of Deep Neural Models in NLP , 2019, ICML.

[22]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[23]  John Hewitt,et al.  Designing and Interpreting Probes with Control Tasks , 2019, EMNLP.

[24]  Mausam,et al.  A Simple Yet Strong Pipeline for HotpotQA , 2020, EMNLP.

[25]  Yejin Choi,et al.  Counterfactual Story Reasoning and Generation , 2019, EMNLP.

[26]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[27]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[28]  Andreas Vlachos,et al.  Generating Token-Level Explanations for Natural Language Inference , 2019, NAACL.

[29]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[30]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[31]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[32]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[33]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[34]  Noah A. Smith,et al.  Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[35]  Abhinav Verma,et al.  Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[36]  Armando Solar-Lezama,et al.  Verifiable Reinforcement Learning via Policy Extraction , 2018, NeurIPS.

[37]  Ye Zhang,et al.  Do Human Rationales Improve Machine Explanations? , 2019, BlackboxNLP@ACL.

[38]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[39]  Sameer Singh,et al.  Obtaining Faithful Interpretations from Compositional Neural Networks , 2020, ACL.

[40]  Raymond J. Mooney,et al.  Faithful Multimodal Explanation for Visual Question Answering , 2018, BlackboxNLP@ACL.

[41]  Noah A. Smith,et al.  Measuring Association Between Labels and Free-Text Rationales , 2020, EMNLP.

[42]  Mohit Bansal,et al.  Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? , 2020, ACL.

[43]  Bolei Zhou,et al.  Network Dissection: Quantifying Interpretability of Deep Visual Representations , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.

[45]  David Harbecke Explaining Natural Language Processing Classifiers with Occlusion and Language Modeling , 2021, ArXiv.

[46]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[47]  Greg Durrett,et al.  Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.

[48]  Devi Parikh,et al.  Do explanations make VQA models more predictable to a human? , 2018, EMNLP.

[49]  Yan Liu,et al.  How does this interaction affect me? Interpretable attribution for feature interactions , 2020, NeurIPS.

[50]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[51]  Percy Liang,et al.  Selective Question Answering under Domain Shift , 2020, ACL.

[52]  Mohit Bansal,et al.  Avoiding Reasoning Shortcuts: Adversarial Evaluation, Training, and Model Development for Multi-Hop QA , 2019, ACL.

[53]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[54]  Guillaume Lample,et al.  What you can cram into a single $&!#* vector: Probing sentence embeddings for linguistic properties , 2018, ACL.

[55]  Nicola De Cao,et al.  How Do Decisions Emerge across Layers in Neural Models? Interpretation with Differentiable Masking , 2020, EMNLP.

[56]  Ming Tu,et al.  Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents , 2020, AAAI.

[57]  Ivan Titov,et al.  Information-Theoretic Probing with Minimum Description Length , 2020, EMNLP.