Measuring Association Between Labels and Free-Text Rationales

Interpretable NLP has taking increasing interest in ensuring that explanations are faithful to the model's decision-making process. This property is crucial for machine learning researchers and practitioners using explanations to better understand models. While prior work focuses primarily on extractive rationales (a subset of the input elements), we investigate their less-studied counterpart: free-text natural language rationales. We demonstrate that existing models for faithful interpretability do not extend cleanly to tasks where free-text rationales are needed. We turn to models that jointly predict and rationalize, a common class of models for free-text rationalization whose faithfulness is not yet established. We propose measurements of label-rationale association, a necessary property of faithful rationales, for these models. Using our measurements, we show that a state-of-the-art joint model based on T5 has strengths and weaknesses for producing faithful rationales.

[1]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[2]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[3]  Helen Hastie,et al.  A Study of Automatic Metrics for the Evaluation of Natural Language Explanations , 2021, EACL.

[4]  Danqi Chen,et al.  of the Association for Computational Linguistics: , 2001 .

[5]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[6]  Taesup Moon,et al.  Fooling Neural Network Interpretations via Adversarial Model Manipulation , 2019, NeurIPS.

[7]  Christopher Ré,et al.  Training Classifiers with Natural Language Explanations , 2018, ACL.

[8]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[9]  Raymond J. Mooney,et al.  Faithful Multimodal Explanation for Visual Question Answering , 2018, BlackboxNLP@ACL.

[10]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[11]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[12]  Yang Liu,et al.  On Identifiability in Transformers , 2020, ICLR.

[13]  Yulia Tsvetkov,et al.  SELFEXPLAIN: A Self-Explaining Architecture for Neural Text Classifiers , 2021, EMNLP.

[14]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[15]  Sameer Singh,et al.  Beyond Accuracy: Behavioral Testing of NLP Models with CheckList , 2020, ACL.

[16]  Trevor Darrell,et al.  Generating Visual Explanations , 2016, ECCV.

[17]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[18]  Atul Prakash,et al.  Analyzing the Interpretability Robustness of Self-Explaining Models , 2019, ArXiv.

[19]  Colin Raffel,et al.  WT5?! Training Text-to-Text Models to Explain their Predictions , 2020, ArXiv.

[20]  Dumitru Erhan,et al.  The (Un)reliability of saliency methods , 2017, Explainable AI.

[21]  Lemao Liu,et al.  Evaluating Explanation Methods for Neural Machine Translation , 2020, ACL.

[22]  Ngoc Thang Vu,et al.  F1 Is Not Enough! Models and Evaluation towards User-Centered Explainable Question Answering , 2020, EMNLP.

[23]  William Yang Wang,et al.  Towards Explainable NLP: A Generative Explanation Framework for Text Classification , 2018, ACL.

[24]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[25]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[26]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[27]  Ken Lang,et al.  NewsWeeder: Learning to Filter Netnews , 1995, ICML.

[28]  Mohit Bansal,et al.  Evaluating Explainable AI: Which Algorithmic Explanations Help Users Predict Model Behavior? , 2020, ACL.

[29]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[30]  Sawan Kumar,et al.  NILE : Natural Language Inference with Faithful Natural Language Explanations , 2020, ACL.

[31]  Jonathan Berant,et al.  Explaining Question Answering Models through Text Generation , 2020, ArXiv.

[32]  Emmanuel Dupoux,et al.  Assessing the Ability of LSTMs to Learn Syntax-Sensitive Dependencies , 2016, TACL.

[33]  Dipanjan Das,et al.  BERT Rediscovers the Classical NLP Pipeline , 2019, ACL.

[34]  Regina Barzilay,et al.  Inferring Which Medical Treatments Work from Reports of Clinical Trials , 2019, NAACL.

[35]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[36]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.

[37]  Mitchell P. Marcus,et al.  OntoNotes: A Unified Relational Semantic Representation , 2007, International Conference on Semantic Computing (ICSC 2007).

[38]  Tommi S. Jaakkola,et al.  Towards Robust Interpretability with Self-Explaining Neural Networks , 2018, NeurIPS.

[39]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[40]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.

[41]  Mihai Surdeanu,et al.  Exploring Interpretability in Event Extraction: Multitask Learning of a Neural Event Classifier and an Explanation Decoder , 2020, ACL.

[42]  Shiyue Zhang,et al.  Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language? , 2020, FINDINGS.

[43]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[44]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[45]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[46]  Jure Leskovec,et al.  Hidden factors and hidden topics: understanding rating dimensions with review text , 2013, RecSys.

[47]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[48]  Daniel G. Goldstein,et al.  Manipulating and Measuring Model Interpretability , 2018, CHI.

[49]  Devi Parikh,et al.  Do explanations make VQA models more predictable to a human? , 2018, EMNLP.

[50]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[51]  Thomas Lukasiewicz,et al.  e-ViL: A Dataset and Benchmark for Natural Language Explanations in Vision-Language Tasks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Nitish Joshi,et al.  Explore, Propose, and Assemble: An Interpretable Model for Multi-Hop Reading Comprehension , 2019, ACL.

[53]  Trevor Darrell,et al.  Textual Explanations for Self-Driving Vehicles , 2018, ECCV.

[54]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[55]  Mark O. Riedl,et al.  Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations , 2017, AIES.

[56]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[57]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[58]  Mihai Surdeanu,et al.  Exploration of Noise Strategies in Semi-supervised Named Entity Classification , 2019, *SEMEVAL.

[59]  Jakob Grue Simonsen,et al.  A Diagnostic Study of Explainability Techniques for Text Classification , 2020, EMNLP.

[60]  Ting Wang,et al.  Interpretable Deep Learning under Fire , 2018, USENIX Security Symposium.

[61]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[63]  Peng Xu,et al.  Zero-shot Cross-lingual Dialogue Systems with Transferable Latent Variables , 2019, EMNLP.

[64]  Ming-Wei Chang,et al.  BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[65]  Sameer Singh,et al.  AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models , 2019, EMNLP.

[66]  Thomas Lukasiewicz,et al.  e-SNLI-VE: Corrected Visual-Textual Entailment with Natural Language Explanations , 2020, 2004.03744.

[67]  Sameer Singh,et al.  How can we fool LIME and SHAP? Adversarial Attacks on Post hoc Explanation Methods , 2019, ArXiv.

[68]  Philipp Koehn,et al.  Saliency-driven Word Alignment Interpretation for Neural Machine Translation , 2019, WMT.

[69]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[70]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[71]  Xing Wang,et al.  Towards Understanding Neural Machine Translation with Word Importance , 2019, EMNLP.

[72]  Sarah Wiegreffe,et al.  Teach Me to Explain: A Review of Datasets for Explainable Natural Language Processing , 2021, NeurIPS Datasets and Benchmarks.

[73]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[74]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[75]  Dilin Wang,et al.  Improving Neural Language Modeling via Adversarial Training , 2019, ICML.

[76]  Hannaneh Hajishirzi,et al.  An Information Bottleneck Approach for Controlling Conciseness in Rationale Extraction , 2020, EMNLP.

[77]  Roy Schwartz,et al.  Bridging CNNs, RNNs, and Weighted Finite-State Machines , 2018, ACL.

[78]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[79]  Jure Leskovec,et al.  Learning Attitudes and Attributes from Multi-aspect Reviews , 2012, 2012 IEEE 12th International Conference on Data Mining.

[80]  Frank Rudzicz,et al.  Sequential Explanations with Mental Model-Based Policies , 2020, ArXiv.

[81]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[82]  Martin Tutek,et al.  Staying True to Your Word: (How) Can Attention Become Explanation? , 2020, RepL4NLP@ACL.

[83]  Sameer Singh,et al.  Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods , 2020, AIES.

[84]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.