Flexible Instance-Specific Rationalization of NLP Models

Recent research on model interpretability in natural language processing extensively uses feature scoring methods for identifying which parts of the input are the most important for a model to make a prediction (i.e. explanation or rationale). However, previous research has shown that there is no clear best scoring method across various text classification tasks while practitioners typically have to make several other adhoc choices regarding the length and the type of the rationale (e.g. short or long, contiguous or not). Inspired by this, we propose a simple yet effective and flexible method that allows selecting optimally for each data instance: (1) a feature scoring method; (2) the length; and (3) the type of the rationale. Our method is inspired by input erasure approaches to interpretability which assume that the most faithful rationale for a prediction should be the one with the highest difference between the model’s output distribution using the full text and the text after removing the rationale as input respectively. Evaluation on four standard text classification datasets shows that our proposed method provides more faithful, comprehensive and highly sufficient explanations compared to using a fixed feature scoring method, rationale length and type. More importantly, we demonstrate that a practitioner is not required to make any ad-hoc choices in order to extract faithful rationales using our approach.

[1]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[2]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[4]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[5]  Regina Barzilay,et al.  Inferring Which Medical Treatments Work from Reports of Clinical Trials , 2019, NAACL.

[6]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[7]  Jasmijn Bastings,et al.  The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? , 2020, BLACKBOXNLP.

[8]  André F. T. Martins,et al.  The Explanation Game: Towards Prediction Explainability through Sparse Communication , 2020, BLACKBOXNLP.

[9]  Klaus-Robert Müller,et al.  Explaining Predictions of Non-Linear Classifiers in NLP , 2016, Rep4NLP@ACL.

[10]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[11]  Klaus-Robert Müller,et al.  Investigating the influence of noise and distractors on the interpretation of neural networks , 2016, ArXiv.

[12]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[13]  Francesco Romani,et al.  Ranking a stream of news , 2005, WWW '05.

[14]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[15]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[16]  Chenhao Tan,et al.  Evaluating and Characterizing Human Rationales , 2020, EMNLP.

[17]  George Chrysostomou,et al.  Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification , 2021, ACL/IJCNLP.

[18]  Sungroh Yoon,et al.  Interpretation of NLP Models through Input Marginalization , 2020, EMNLP.

[19]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[20]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[21]  Ziming Huang,et al.  On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation , 2021, Annual Meeting of the Association for Computational Linguistics.

[22]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[23]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[24]  Nikolaos Aletras,et al.  Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience , 2021, EMNLP.

[25]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[26]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[27]  Avanti Shrikumar,et al.  Learning Important Features Through Propagating Activation Differences , 2017, ICML.

[28]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[29]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[30]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[31]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[32]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[33]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[34]  Jakob Grue Simonsen,et al.  A Diagnostic Study of Explainability Techniques for Text Classification , 2020, EMNLP.

[35]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.