Explaining NLP Models via Minimal Contrastive Editing (MiCE)

Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for producing contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks--binary sentiment classification, topic classification, and multiple-choice question answering--show that MiCE is able to produce edits that are not only contrastive, but also minimal and fluent, consistent with human contrastive edits. We demonstrate how MiCE edits can be used for two use cases in NLP system development--debugging incorrect model outputs and uncovering dataset artifacts--and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability.

[1]  Thomas Lukasiewicz,et al.  Controlling Text Edition by Changing Answers of Specific Questions , 2021, FINDINGS.

[2]  Ana Marasovi'c,et al.  Effective Attention Sheds Light On Interpretability , 2021, FINDINGS.

[3]  Yejin Choi,et al.  Contrastive Explanations for Model Interpretability , 2021, EMNLP.

[4]  Nishtha Madaan,et al.  Generate Your Counterfactuals: Towards Controlled Counterfactual Generation for Text , 2020, AAAI.

[5]  Alexander Rush,et al.  Adversarial Semantic Collisions , 2020, EMNLP.

[6]  Barry Smyth,et al.  Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification , 2020, COLING.

[7]  Balaji Vasan Srinivasan,et al.  Multi-dimensional Style Transfer for Partially Annotated Data using Language Models as Discriminators , 2020, ArXiv.

[8]  Chuanrong Li,et al.  Linguistically-Informed Transformations (LIT): A Method for Automatically Generating Contrast Sets , 2020, BLACKBOXNLP.

[9]  Yejin Choi,et al.  Natural Language Rationales with Full-Stack Visual Reasoning: From Pixels to Semantic Frames to Commonsense Graphs , 2020, FINDINGS.

[10]  Zhou Yu,et al.  ALICE: Active Learning with Contrastive Natural Language Explanations , 2020, EMNLP.

[11]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, Transactions of the Association for Computational Linguistics.

[12]  Uri Shalit,et al.  CausaLM: Causal Model Explanation Through Counterfactual Language Models , 2020, CL.

[13]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[14]  Colin Raffel,et al.  WT5?! Training Text-to-Text Models to Explain their Predictions , 2020, ArXiv.

[15]  Bernd Bischl,et al.  Multi-Objective Counterfactual Explanations , 2020, PPSN.

[16]  Anton van den Hengel,et al.  Learning What Makes a Difference from Counterfactual Examples and Gradient Supervision , 2020, ECCV.

[17]  Noah A. Smith,et al.  Evaluating Models’ Local Decision Boundaries via Contrast Sets , 2020, FINDINGS.

[18]  Siddhant Garg,et al.  BAE: BERT-based Adversarial Examples for Text Classification , 2020, EMNLP.

[19]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[20]  Shruti Tople,et al.  To Transfer or Not to Transfer: Misclassification Attacks Against Transfer Learned Text Classifiers , 2020, ArXiv.

[21]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2019, ACL.

[22]  Davis Liang,et al.  Masked Language Model Scoring , 2019, ACL.

[23]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[24]  Hanna M. Wallach,et al.  Weight of Evidence as a Basis for Human-Oriented Explanations , 2019, ArXiv.

[25]  Peter J. Liu,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[26]  Teven Le Scao,et al.  Transformers: State-of-the-Art Natural Language Processing , 2019, EMNLP.

[27]  Tommi S. Jaakkola,et al.  A Game Theoretic Approach to Class-wise Selective Rationalization , 2019, NeurIPS.

[28]  Zachary Chase Lipton,et al.  Learning the Difference that Makes a Difference with Counterfactually-Augmented Data , 2019, ICLR.

[29]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[30]  Tao Zhang,et al.  Mask and Infill: Applying Masked Language Model for Sentiment Transfer , 2019, IJCAI.

[31]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[32]  Janis Klaise,et al.  Interpretable Counterfactual Explanations Guided by Prototypes , 2019, ECML/PKDD.

[33]  Wanxiang Che,et al.  Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency , 2019, ACL.

[34]  Lei Li,et al.  Generating Fluent Adversarial Examples for Natural Languages , 2019, ACL.

[35]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[36]  Amir-Hossein Karimi,et al.  Model-Agnostic Counterfactual Explanations for Consequential Decisions , 2019, AISTATS.

[37]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[38]  Yejin Choi,et al.  The Curious Case of Neural Text Degeneration , 2019, ICLR.

[39]  Ziyan Wu,et al.  Counterfactual Visual Explanations , 2019, ICML.

[40]  Mireia Ribera,et al.  Can we do better explanations? A proposal of user-centered explainable AI , 2019, IUI Workshops.

[41]  Chris Russell,et al.  Efficient Search for Diverse Coherent Explanations , 2019, FAT.

[42]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[43]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[44]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Trevor Darrell,et al.  Grounding Visual Explanations , 2018, ECCV.

[46]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[47]  Yann Dauphin,et al.  Hierarchical Neural Story Generation , 2018, ACL.

[48]  Mani B. Srivastava,et al.  Generating Natural Language Adversarial Examples , 2018, EMNLP.

[49]  Luke S. Zettlemoyer,et al.  Adversarial Example Generation with Syntactically Controlled Paraphrase Networks , 2018, NAACL.

[50]  Percy Liang,et al.  Delete, Retrieve, Generate: a Simple Approach to Sentiment and Style Transfer , 2018, NAACL.

[51]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[52]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[53]  Dongyan Zhao,et al.  Style Transfer in Text: Exploration and Evaluation , 2017, AAAI.

[54]  Chris Russell,et al.  Counterfactual Explanations Without Opening the Black Box: Automated Decisions and the GDPR , 2017, ArXiv.

[55]  Seth Chin-Parker,et al.  Contrastive Constraints Guide Explanation-Based Category Learning , 2017, Cogn. Sci..

[56]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[57]  D. Hilton Social Attribution and Explanation , 2017 .

[58]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[59]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[60]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[61]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[62]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[63]  Marco Tulio Ribeiro,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[64]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[65]  Andrew L. Maas,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[66]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[67]  P. Lipton Contrastive Explanation , 1990, Royal Institute of Philosophy Supplement.

[68]  Jeffrey Heer,et al.  Polyjuice: Automated, General-purpose Counterfactual Generation , 2021, ArXiv.

[69]  Daniel S. Weld,et al.  Data Staining: A Method for Comparing Faithfulness of Explainers , 2020 .

[70]  Yonatan Belinkov,et al.  Investigating Gender Bias in Language Models Using Causal Mediation Analysis , 2020, NeurIPS.

[71]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[72]  Mandy Eberhart,et al.  The Scientific Image , 2016 .

[73]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[74]  Causality : Models , Reasoning , and Inference , 2022 .