Interpreting Predictions of NLP Models

Although neural NLP models are highly expressive and empirically successful, they also systematically fail in counterintuitive ways and are opaque in their decision-making process. This tutorial will provide a background on interpretation techniques, i.e., methods for explaining the predictions of NLP models. We will first situate example-specific interpretations in the context of other ways to understand models (e.g., probing, dataset analyses). Next, we will present a thorough study of example-specific interpretations, including saliency maps, input perturbations (e.g., LIME, input reduction), adversarial attacks, and influence functions. Alongside these descriptions, we will walk through source code that creates and visualizes interpretations for a diverse set of NLP tasks. Finally, we will discuss open problems in the field, e.g., evaluating, extending, and improving interpretation methods.

[1]  Carlos Guestrin,et al.  Anchors: High-Precision Model-Agnostic Explanations , 2018, AAAI.

[2]  R. Thomas McCoy,et al.  Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference , 2019, ACL.

[3]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[4]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[5]  Pradeep Ravikumar,et al.  Representer Point Selection for Explaining Deep Neural Networks , 2018, NeurIPS.

[6]  Sameer Singh,et al.  Deep Adversarial Learning for NLP , 2019, NAACL.

[7]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[8]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[9]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[10]  Jeffrey Heer,et al.  Errudite: Scalable, Reproducible, and Testable Error Analysis , 2019, ACL.

[11]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[12]  Sameer Singh,et al.  AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models , 2019, EMNLP.

[13]  Dejing Dou,et al.  On Adversarial Examples for Character-Level Neural Machine Translation , 2018, COLING.

[14]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[15]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[16]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[17]  Carolyn Penstein Rosé,et al.  Stress Test Evaluation for Natural Language Inference , 2018, COLING.

[18]  Alex Wang,et al.  What do you learn from context? Probing for sentence structure in contextualized word representations , 2019, ICLR.

[19]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[20]  Carlos Guestrin,et al.  Semantically Equivalent Adversarial Rules for Debugging NLP models , 2018, ACL.

[21]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[22]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[23]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[24]  Yejin Choi,et al.  The Risk of Racial Bias in Hate Speech Detection , 2019, ACL.

[25]  Nicholas Lourie,et al.  Writing Code for NLP Research , 2018, EMNLP 2018.

[26]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[27]  Yonatan Belinkov,et al.  Linguistic Knowledge and Transferability of Contextual Representations , 2019, NAACL.

[28]  Sameer Singh,et al.  Universal Adversarial Triggers for Attacking and Analyzing NLP , 2019, EMNLP.

[29]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[30]  Luke S. Zettlemoyer,et al.  Neural Semantic Parsing , 2018, ACL.

[31]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[34]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.