Rationales for Sequential Predictions

Sequence models are a critical component of modern NLP systems, but their predictions are difficult to explain. We consider model explanations though rationales, subsets of context that can explain individual model predictions. We find sequential rationales by solving a combinatorial optimization: the best rationale is the smallest subset of input tokens that would predict the same output as the full sequence. Enumerating all subsets is intractable, so we propose an efficient greedy algorithm to approximate this objective. The algorithm, which is called greedy rationalization, applies to any model. For this approach to be effective, the model should form compatible conditional distributions when making predictions on incomplete subsets of the context. This condition can be enforced with a short finetuning step. We study greedy rationalization on language modeling and machine translation. Compared to existing baselines, greedy rationalization is best at optimizing the combinatorial objective and provides the most faithful rationales. On a new dataset of annotated sequential rationales, greedy rationales are most similar to human rationales.

[1]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[2]  Le Song,et al.  Learning to Explain: An Information-Theoretic Perspective on Model Interpretation , 2018, ICML.

[3]  B. Arnold,et al.  Compatible Conditional Distributions , 1989 .

[4]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[5]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[6]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[7]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[8]  Yindalon Aphinyanagphongs,et al.  Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations , 2021, AISTATS.

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  Sandro Pezzelle,et al.  The LAMBADA dataset: Word prediction requiring a broad discourse context , 2016, ACL.

[11]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[12]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[13]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[14]  Kyunghyun Cho,et al.  Rissanen Data Analysis: Examining Dataset Characteristics via Description Length , 2021, ICML.

[15]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[16]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[17]  Manaal Faruqui,et al.  Attention Interpretability Across NLP Tasks , 2019, ArXiv.

[18]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[19]  Grzegorz Chrupala,et al.  Representation of Linguistic Form and Function in Recurrent Neural Networks , 2016, CL.

[20]  Myle Ott,et al.  fairseq: A Fast, Extensible Toolkit for Sequence Modeling , 2019, NAACL.

[21]  Balaraman Ravindran,et al.  Towards Transparent and Explainable Attention Models , 2020, ACL.

[22]  Daniel Jurafsky,et al.  Understanding Neural Networks through Representation Erasure , 2016, ArXiv.

[23]  Willem Zuidema,et al.  Quantifying Attention Flow in Transformers , 2020, ACL.

[24]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[25]  Rico Sennrich,et al.  Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation , 2020, ACL.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[28]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[29]  Jasmijn Bastings,et al.  The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? , 2020, BLACKBOXNLP.

[30]  Sameer Singh,et al.  Gradient-based Analysis of NLP Models is Manipulable , 2020, FINDINGS.

[31]  Marcello Federico,et al.  Report on the 10th IWSLT evaluation campaign , 2013, IWSLT.

[32]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[33]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[34]  Hinrich Schütze,et al.  Evaluating neural network explanation methods using hybrid documents and morphological agreement , 2018 .

[35]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[36]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[37]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[38]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[41]  Yang Liu,et al.  On Identifiability in Transformers , 2020, ICLR.

[42]  Kentaro Inui,et al.  Attention is Not Only a Weight: Analyzing Transformers with Vector Norms , 2020, EMNLP.

[43]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[44]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[45]  Andrew Zisserman,et al.  Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps , 2013, ICLR.

[46]  Dumitru Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[47]  Mihaela van der Schaar,et al.  INVASE: Instance-wise Variable Selection using Neural Networks , 2018, ICLR.

[48]  J. Rissanen,et al.  Modeling By Shortest Data Description* , 1978, Autom..

[49]  Misha Denil,et al.  Extraction of Salient Sentences from Labelled Documents , 2014, ArXiv.

[50]  Abubakar Abid,et al.  Interpretation of Neural Networks is Fragile , 2017, AAAI.