Improving the Faithfulness of Attention-based Explanations with Task-specific Information for Text Classification

Neural network architectures in natural language processing often use attention mechanisms to produce probability distributions over input token representations. Attention has empirically been demonstrated to improve performance in various tasks, while its weights have been extensively used as explanations for model predictions. Recent studies (Jain and Wallace, 2019; Serrano and Smith, 2019; Wiegreffe and Pinter, 2019) have showed that it cannot generally be considered as a faithful explanation (Jacovi and Goldberg, 2020) across encoders and tasks. In this paper, we seek to improve the faithfulness of attention-based explanations for text classification. We achieve this by proposing a new family of Task-Scaling (TaSc) mechanisms that learn task-specific non-contextualised information to scale the original attention weights. Evaluation tests for explanation faithfulness, show that the three proposed variants of TaSc improve attentionbased explanations across two attention mechanisms, five encoders and five text classification datasets without sacrificing predictive performance. Finally, we demonstrate that TaSc consistently provides more faithful attentionbased explanations compared to three widelyused interpretability techniques.1

[1]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[2]  Xuanjing Huang,et al.  How to Fine-Tune BERT for Text Classification? , 2019, CCL.

[3]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Xiangnan Kong,et al.  Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? , 2020, ACL.

[6]  Jimeng Sun,et al.  RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism , 2016, NIPS.

[7]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[8]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[9]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[10]  Bin Huang,et al.  Opening the black box of neural networks: methods for interpreting neural network models in clinical applications. , 2018, Annals of translational medicine.

[11]  Klaus-Robert Müller,et al.  Investigating the influence of noise and distractors on the interpretation of neural networks , 2016, ArXiv.

[12]  Manaal Faruqui,et al.  Attention Interpretability Across NLP Tasks , 2019, ArXiv.

[13]  Yangfeng Ji,et al.  Learning Variational Word Masks to Improve the Interpretability of Neural Text Classifiers , 2020, Conference on Empirical Methods in Natural Language Processing.

[14]  Li Zhao,et al.  Attention-based LSTM for Aspect-level Sentiment Classification , 2016, EMNLP.

[15]  Elijah Mayfield,et al.  Why Attention is Not Explanation: Surgical Intervention and Causal Reasoning about Neural Models , 2020, LREC.

[16]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Ion Androutsopoulos,et al.  Paragraph-level Rationale Extraction through Regularization: A case study on European Court of Human Rights Cases , 2021, NAACL.

[19]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[20]  Matthijs Douze,et al.  FastText.zip: Compressing text classification models , 2016, ArXiv.

[21]  André F. T. Martins,et al.  The Explanation Game: Towards Prediction Explainability through Sparse Communication , 2020, BLACKBOXNLP.

[22]  Klaus-Robert Müller,et al.  Explaining Predictions of Non-Linear Classifiers in NLP , 2016, Rep4NLP@ACL.

[23]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Xiaobing Sun,et al.  Understanding Attention for Text Classification , 2020, ACL.

[26]  Balaraman Ravindran,et al.  Towards Transparent and Explainable Attention Models , 2020, ACL.

[27]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[28]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[29]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[30]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[31]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[32]  Klaus-Robert Müller,et al.  Explaining Recurrent Neural Network Predictions in Sentiment Analysis , 2017, WASSA@EMNLP.

[33]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[34]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[35]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[36]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[37]  Yoshua Bengio,et al.  Object Recognition with Gradient-Based Learning , 1999, Shape, Contour and Grouping in Computer Vision.

[38]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[39]  Steven Schockaert,et al.  Interpretable Emoji Prediction via Label-Wise Attention LSTMs , 2018, EMNLP.

[40]  Lidong Bing,et al.  Recurrent Attention Network on Memory for Aspect Sentiment Analysis , 2017, EMNLP.

[41]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[42]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[45]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[46]  Martin Tutek,et al.  Staying True to Your Word: (How) Can Attention Become Explanation? , 2020, RepL4NLP@ACL.

[47]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[48]  Graham Neubig,et al.  Learning to Deceive with Attention-Based Explanations , 2020, ACL.

[49]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[50]  Xiaoli Z. Fern,et al.  Interpreting Recurrent and Attention-Based Neural Models: a Case Study on Natural Language Inference , 2018, EMNLP.

[51]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[52]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[53]  Jakob Grue Simonsen,et al.  A Diagnostic Study of Explainability Techniques for Text Classification , 2020, EMNLP.

[54]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[55]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.