Explaining NLP Models via Minimal Contrastive Editing (MiCE)

Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present MINIMAL CONTRASTIVE EDITING (MICE), a method for producing contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks—binary sentiment classification, topic classification, and multiple-choice question answering—show that MICE is able to produce edits that are not only contrastive, but also minimal and fluent, consistent with human contrastive edits. We demonstrate how MICE edits can be used for two use cases in NLP system development—debugging incorrect model outputs and uncovering dataset artifacts—and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability.

[1]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[2]  Mandy Eberhart,et al.  The Scientific Image , 2016 .

[3]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[4]  Barry Smyth,et al.  Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification , 2020, COLING.

[5]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[6]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[7]  Yonatan Belinkov,et al.  Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.

[8]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[9]  Yejin Choi,et al.  Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs , 2020, FINDINGS.

[10]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[11]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.

[12]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[13]  Alexander Rush,et al.  Adversarial Semantic Collisions , 2020, EMNLP.

[14]  Mireia Ribera,et al.  Can we do better explanations? A proposal of user-centered explainable AI , 2019, IUI Workshops.

[15]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[16]  Seth Chin-Parker,et al.  Contrastive Constraints Guide Explanation-Based Category Learning , 2017, Cogn. Sci..

[17]  Eduard Hovy,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2020, ICLR.

[18]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[19]  Zhou Yu,et al.  ALICE: Active Learning with Contrastive Natural Language Explanations , 2020, EMNLP.

[20]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[22]  Uri Shalit,et al.  CausaLM: Causal Model Explanation Through Counterfactual Language Models , 2020, CL.

[23]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[24]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[25]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[26]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[27]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[28]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[29]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[30]  D. Hilton Social Attribution and Explanation , 2017 .

[31]  Anton van den Hengel,et al.  Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision , 2020, ECCV.

[32]  Tao Zhang,et al.  Mask and Infill: Applying Masked Language Model for Sentiment Transfer , 2019, IJCAI.

[33]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.

[34]  Shruti Tople,et al.  To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers , 2020, ArXiv.

[35]  Amir-Hossein Karimi,et al.  Model-Agnostic Counterfactual Explanations for Consequential Decisions , 2019, AISTATS.

[36]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[37]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[38]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[39]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[40]  Davis Liang,et al.  Masked Language Model Scoring , 2019, ACL.

[41]  Noah A. Smith,et al.  Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[42]  Yejin Choi,et al.  Contrastive Explanations for Model Interpretability , 2021, EMNLP.

[43]  Jeffrey Heer,et al.  Polyjuice: Automated, General-purpose Counterfactual Generation , 2021, ArXiv.

[44]  Balaji Vasan Srinivasan,et al.  Multi-dimensional Style Transfer for Partially Annotated Data using Language Models as Discriminators , 2020, ArXiv.

[45]  Aliaksei Severyn,et al.  Unsupervised Text Style Transfer with Masked Language Models , 2020, EMNLP.

[46]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[47]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[48]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[49]  Daniel S. Weld,et al.  Data Staining: A Method for Comparing Faithfulness of Explainers , 2020 .

[50]  Nishtha Madaan,et al.  Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Text , 2020, AAAI.

[51]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[52]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[53]  Bernd Bischl,et al.  Multi-Objective Counterfactual Explanations , 2020, PPSN.

[54]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[55]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[56]  Colin Raffel,et al.  WT5?! Training Text-to-Text Models to Explain their Predictions , 2020, ArXiv.

[57]  Siddhant Garg,et al.  BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[58]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[59]  Hanna M. Wallach,et al.  Weight of Evidence as a Basis for Human-Oriented Explanations , 2019, ArXiv.

[60]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[61]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[62]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[63]  Trevor Darrell,et al.  Grounding Visual Explanations , 2018, ECCV.

[64]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[65]  Ana Marasovi'c,et al.  Effective Attention Sheds Light On Interpretability , 2021, FINDINGS.

[66]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[67]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[68]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[69]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[70]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[71]  Thomas Lukasiewicz,et al.  Controlling Text Edition by Changing Answers of Specific Questions , 2021, FINDINGS.

[72]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.