On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness. In this work, for the first time, we can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. On top of this, we implement a hessian-free method with a model faithfulness guarantee. Finally, to compare our method with the others, we propose a semantic-based evaluation metric that can better align with humans’ judgment of explanations than the widely adopted diagnostic or re-training measures. The empirical results on multiple real data sets demonstrate the proposed method’s superior performance to popular explanation techniques such as Influence Function or TracIn on semantic evaluation.

[1]  William W. Cohen,et al.  Evaluating Explanations: How Much Do Explanations from the Teacher Aid Students? , 2020, TACL.

[2]  Si Chen,et al.  A structure-enhanced graph convolutional network for sentiment analysis , 2020, EMNLP 2020.

[3]  Sungroh Yoon,et al.  Interpretation of NLP Models through Input Marginalization , 2020, EMNLP.

[4]  Yiding Hao,et al.  Evaluating Attribution Methods using White-Box LSTMs , 2020, BLACKBOXNLP.

[5]  Eduard Hovy,et al.  Pair the Dots: Jointly Examining Training History and Test Stimuli for Model Interpretability , 2020, ArXiv.

[6]  Samyadeep Basu,et al.  Influence Functions in Deep Learning Are Fragile , 2020, ICLR.

[7]  Kentaro Inui,et al.  Evaluation Criteria for Instance-based Explanation , 2020, ArXiv.

[8]  Yoav Goldberg,et al.  Aligning Faithful Interpretations with their Social Attribution , 2020, Transactions of the Association for Computational Linguistics.

[9]  Yulia Tsvetkov,et al.  Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions , 2020, ACL.

[10]  Ronan Le Bras,et al.  G-DAug: Generative Data Augmentation for Commonsense Reasoning , 2020, FINDINGS.

[11]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[12]  Xipeng Qiu,et al.  BERT-ATTACK: Adversarial Attack against BERT Using BERT , 2020, EMNLP.

[13]  Doug Downey,et al.  G-DAug: Generative Data Augmentation for Commonsense Reasoning , 2020, FINDINGS.

[14]  Frederick Liu,et al.  Estimating Training Data Influence by Tracking Gradient Descent , 2020, NeurIPS.

[15]  Xiang Ao,et al.  A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis , 2019, EMNLP.

[16]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[17]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[18]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[19]  Omer Levy,et al.  What Does BERT Look at? An Analysis of BERT’s Attention , 2019, BlackboxNLP@ACL.

[20]  Noah A. Smith,et al.  Variational Pretraining for Semi-supervised Text Classification , 2019, ACL.

[21]  Noah A. Smith,et al.  Is Attention Interpretable? , 2019, ACL.

[22]  Percy Liang,et al.  On the Accuracy of Influence Functions for Measuring Group Effects , 2019, NeurIPS.

[23]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[24]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[25]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[26]  D. Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[27]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[28]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[29]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[30]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[31]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[32]  Barak A. Pearlmutter,et al.  Tricks from Deep Learning , 2016, ArXiv.

[33]  Guillaume Bouchard,et al.  SentiHood: Targeted Aspect Based Sentiment Analysis Dataset for Urban Neighbourhoods , 2016, COLING.

[34]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[35]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[36]  S. Weisberg,et al.  Residuals and Influence in Regression , 1982 .

[37]  F. Hampel The Influence Curve and Its Role in Robust Estimation , 1974 .

[38]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[39]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .