On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness. In this work, for the first time, we can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. On top of this, we implement a hessian-free method with a model faithfulness guarantee. Finally, to compare our method with the others, we propose a semantic-based evaluation metric that can better align with humans’ judgment of explanations than the widely adopted diagnostic or retraining measures. The empirical results on multiple real data sets demonstrate the proposed method’s superior performance to popular explanation techniques such as Influence Function or TracIn on semantic evaluation.

[1]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[2]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[3]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[4]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[5]  Barak A. Pearlmutter,et al.  Tricks from Deep Learning , 2016, ArXiv.

[6]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[7]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[8]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[9]  Ronan Le Bras,et al.  Generative Data Augmentation for Commonsense Reasoning , 2020, EMNLP 2020.

[10]  Doug Downey,et al.  G-DAug: Generative Data Augmentation for Commonsense Reasoning , 2020, FINDINGS.

[11]  Si Chen,et al.  A structure-enhanced graph convolutional network for sentiment analysis , 2020, EMNLP 2020.

[12]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, ArXiv.

[13]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[16]  Sungroh Yoon,et al.  Interpretation of NLP Models through Input Marginalization , 2020, EMNLP.

[17]  Kentaro Inui,et al.  Evaluation Criteria for Instance-based Explanation , 2020, ArXiv.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[20]  Frederick Liu,et al.  Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[23]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[24]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[25]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[26]  Guillaume Bouchard,et al.  SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods , 2016, COLING.

[27]  William W. Cohen,et al.  Evaluating Explanations: How Much Do Explanations from the Teacher Aid Students? , 2020, TACL.

[28]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[29]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[30]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[31]  Eduard Hovy,et al.  Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability , 2020, ArXiv.

[32]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[33]  Samyadeep Basu,et al.  Influence Functions in Deep Learning Are Fragile , 2020, ICLR.

[34]  Yulia Tsvetkov,et al.  Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions , 2020, ACL.

[35]  Percy Liang,et al.  On the Accuracy of Influence Functions for Measuring Group Effects , 2019, NeurIPS.

[36]  Xiang Ao,et al.  A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis , 2019, EMNLP.

[37]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.