ERASER: A Benchmark to Evaluate Rationalized NLP Models

State-of-the-art models in NLP are now predominantly based on deep neural networks that are opaque in terms of how they come to make predictions. This limitation has increased interest in designing more interpretable deep models for NLP that reveal the ‘reasoning’ behind model outputs. But work in this direction has been conducted on different datasets and tasks with correspondingly unique aims and metrics; this makes it difficult to track progress. We propose the Evaluating Rationales And Simple English Reasoning (ERASER a benchmark to advance research on interpretable models in NLP. This benchmark comprises multiple datasets and tasks for which human annotations of “rationales” (supporting evidence) have been collected. We propose several metrics that aim to capture how well the rationales provided by models align with human rationales, and also how faithful these rationales are (i.e., the degree to which provided rationales influenced the corresponding predictions). Our hope is that releasing this benchmark facilitates progress on designing more interpretable NLP systems. The benchmark, code, and documentation are available at https://www.eraserbenchmark.com/

[1]  Matthew Lease,et al.  Why Is That Relevant? Collecting Annotator Rationales for Relevance Judgments , 2016, HCOMP.

[2]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[3]  Carla E. Brodley,et al.  The Constrained Weight Space SVM: Learning with Ranked Features , 2011, ICML.

[4]  Jason Eisner,et al.  Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[5]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[6]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[7]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[8]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[9]  Roger Wattenhofer,et al.  On Identifiability in Transformers , 2019, ICLR.

[10]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[11]  Roger Wattenhofer,et al.  On the Validity of Self-Attention as Explanation in Transformer Models , 2019, ArXiv.

[12]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[13]  Dong Nguyen,et al.  Comparing Automatic and Human Evaluation of Local Explanations for Text Classification , 2018, NAACL.

[14]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15]  Manaal Faruqui,et al.  Attention Interpretability Across NLP Tasks , 2019, ArXiv.

[16]  Chris Callison-Burch,et al.  Seeing Things from a Different Angle:Discovering Diverse Perspectives about Claims , 2019, NAACL.

[17]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[20]  Ye Zhang,et al.  Do Human Rationales Improve Machine Explanations? , 2019, BlackboxNLP@ACL.

[21]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[22]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[23]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[24]  Been Kim,et al.  Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[25]  Iz Beltagy,et al.  SciBERT: A Pretrained Language Model for Scientific Text , 2019, EMNLP.

[26]  D. Erhan,et al.  A Benchmark for Interpretability Methods in Deep Neural Networks , 2018, NeurIPS.

[27]  Tommi S. Jaakkola,et al.  A causal framework for explaining the predictions of black-box sequence-to-sequence models , 2017, EMNLP.

[28]  Regina Barzilay,et al.  Inferring Which Medical Treatments Work from Reports of Clinical Trials , 2019, NAACL.

[29]  Alexander Binder,et al.  Evaluating the Visualization of What a Deep Neural Network Has Learned , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Anoop Sarkar,et al.  Interrogating the Explanatory Power of Attention in Neural Machine Translation , 2019, EMNLP.

[31]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[32]  Klaus-Robert Müller,et al.  "What is relevant in a text document?": An interpretable machine learning approach , 2016, PloS one.

[33]  David J. Pearce,et al.  An Improved Algorithm for Finding the Strongly Connected Components of a Directed Graph , 2005 .

[34]  Manali Sharma,et al.  Active Learning with Rationales for Text Classification , 2015, NAACL.

[35]  Yang Liu,et al.  Visualizing and Understanding Neural Machine Translation , 2017, ACL.

[36]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[37]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[38]  Matthew Lease,et al.  The Many Benefits of Annotator Rationales for Relevance Judgments , 2017, IJCAI.

[39]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[40]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[41]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[42]  Omer Levy,et al.  GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding , 2018, BlackboxNLP@EMNLP.

[43]  Ming-Wei Chang,et al.  BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions , 2019, NAACL.

[44]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[45]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[46]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[47]  Regina Barzilay,et al.  Towards Debiasing Fact Verification Models , 2019, EMNLP.

[48]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[49]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[50]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[51]  Jonathan Berant,et al.  CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[52]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[53]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[54]  Yuval Pinter,et al.  Attention is not not Explanation , 2019, EMNLP.

[55]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[56]  Zachary Chase Lipton,et al.  Learning to Deceive with Attention-Based Explanations , 2019, ACL.

[57]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[58]  Omer Levy,et al.  SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[59]  Ye Zhang,et al.  Rationale-Augmented Convolutional Neural Networks for Text Classification , 2016, EMNLP.

[60]  Alexander Binder,et al.  Explaining nonlinear classification decisions with deep Taylor decomposition , 2015, Pattern Recognit..

[61]  Daniel King,et al.  ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing , 2019, BioNLP@ACL.

[62]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[63]  Tapio Salakoski,et al.  Distributional Semantics Resources for Biomedical Text Processing , 2013 .

[64]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[65]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[66]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[67]  Martin Wattenberg,et al.  SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[68]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[69]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[70]  Kathleen McKeown,et al.  Fine-grained Sentiment Analysis with Faithful Attention , 2019, ArXiv.

[71]  Carla E. Brodley,et al.  Active learning for biomedical citation screening , 2010, KDD.