Why do you think that? Exploring faithful sentence–level rationales without supervision

Evaluating the trustworthiness of a model's prediction is essential for differentiating between `right for the right reasons' and `right for the wrong reasons'. Identifying textual spans that determine the target label, known as faithful rationales, usually relies on pipeline approaches or reinforcement learning. However, such methods either require supervision and thus costly annotation of the rationales or employ non-differentiable models. We propose a differentiable training-framework to create models which output faithful rationales on a sentence level, by solely applying supervision on the target task. To achieve this, our model solves the task based on each rationale individually and learns to assign high scores to those which solved the task best. Our evaluation on three different datasets shows competitive results compared to a standard BERT blackbox while exceeding a pipeline counterpart's performance in two cases. We further exploit the transparent decision-making process of these models to prefer selecting the correct rationales by applying direct supervision, thereby boosting the performance on the rationale-level.

[1]  Pasquale Minervini,et al.  There is Strength in Numbers: Avoiding the Hypothesis-Only Bias in Natural Language Inference via Ensemble Adversarial Training , 2020, EMNLP.

[2]  Sameer Singh,et al.  AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models , 2019, EMNLP.

[3]  Byron C. Wallace,et al.  Attention is not Explanation , 2019, NAACL.

[4]  Tommi S. Jaakkola,et al.  Rethinking Cooperative Rationalization: Introspective Extraction and Complement Control , 2019, EMNLP.

[5]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[6]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[7]  Dan Roth,et al.  TwoWingOS: A Two-Wing Optimization Strategy for Evidential Claim Verification , 2018, EMNLP.

[8]  Mohit Bansal,et al.  Self-Assembling Modular Networks for Interpretable Multi-Hop Reasoning , 2019, EMNLP/IJCNLP.

[9]  Ming Tu,et al.  Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple Documents , 2020, AAAI.

[10]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Iryna Gurevych,et al.  What do Deep Networks Like to Read? , 2019, ArXiv.

[12]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[13]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[14]  Thomas Lukasiewicz,et al.  Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods , 2019, ArXiv.

[15]  Raymond J. Mooney,et al.  Faithful Multimodal Explanation for Visual Question Answering , 2018, BlackboxNLP@ACL.

[16]  Dan Roth,et al.  Neural Module Networks for Reasoning over Text , 2020, ICLR.

[17]  Shi Feng,et al.  Pathologies of Neural Models Make Interpretations Difficult , 2018, EMNLP.

[18]  Franco Turini,et al.  A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[19]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[20]  Dong Yu,et al.  Evidence Sentence Extraction for Machine Reading Comprehension , 2019, CoNLL.

[21]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[22]  Yoav Goldberg,et al.  Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness? , 2020, ACL.

[23]  Oren Melamud,et al.  Combining Unsupervised Pre-training and Annotator Rationales to Improve Low-shot Text Classification , 2019, EMNLP.

[24]  Thomas Lukasiewicz,et al.  e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[25]  Grzegorz Chrupala,et al.  Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop , 2019, Natural Language Engineering.

[26]  Greg Durrett,et al.  Understanding Dataset Design Choices for Multi-hop Reasoning , 2019, NAACL.

[27]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[28]  Richard Socher,et al.  Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[29]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[30]  Mihai Surdeanu,et al.  Quick and (not so) Dirty: Unsupervised Selection of Justification Sentences for Multi-hop Question Answering , 2019, EMNLP.

[31]  Regina Barzilay,et al.  Inferring Which Medical Treatments Work from Reports of Clinical Trials , 2019, NAACL.

[32]  Byron C. Wallace,et al.  Learning to Faithfully Rationalize by Construction , 2020, ACL.

[33]  Regina Barzilay,et al.  Towards Debiasing Fact Verification Models , 2019, EMNLP.

[34]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[35]  Gaberell Drachman,et al.  On explaining “Explanations” , 2016 .

[36]  Omer Levy,et al.  Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[37]  Byron C. Wallace,et al.  ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[38]  Ye Zhang,et al.  Do Human Rationales Improve Machine Explanations? , 2019, BlackboxNLP@ACL.

[39]  Andreas Vlachos,et al.  FEVER: a Large-scale Dataset for Fact Extraction and VERification , 2018, NAACL.

[40]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[41]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[42]  Lei Li,et al.  Dynamically Fused Graph Network for Multi-hop Reasoning , 2019, ACL.

[43]  Peter A. Flach,et al.  Explainability fact sheets: a framework for systematic assessment of explainable approaches , 2019, FAT*.

[44]  Reut Tsarfaty,et al.  Evaluating NLP Models via Contrast Sets , 2020, ArXiv.

[45]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47]  Anna Rumshisky,et al.  A Primer in BERTology: What We Know About How BERT Works , 2020, Transactions of the Association for Computational Linguistics.

[48]  Alun D. Preece,et al.  Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).

[49]  Jonathan Berant,et al.  Explaining Question Answering Models through Text Generation , 2020, ArXiv.

[50]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[51]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[52]  William Yang Wang,et al.  Towards Explainable NLP: A Generative Explanation Framework for Text Classification , 2018, ACL.

[53]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.