论文信息 - WT5?! Training Text-to-Text Models to Explain their Predictions - 字舞流文

WT5?! Training Text-to-Text Models to Explain their Predictions

Neural networks have recently achieved human-level performance on various challenging natural language processing (NLP) tasks, but it is notoriously difficult to understand why a neural network produced a particular prediction. In this paper, we leverage the text-to-text framework proposed by Raffel et al.(2019) to train language models to output a natural text explanation alongside their prediction. Crucially, this requires no modifications to the loss function or training and decoding procedures -- we simply train the model to output the explanation after generating the (natural text) prediction. We show that this approach not only obtains state-of-the-art results on explainability benchmarks, but also permits learning from a limited set of labeled explanations and transferring rationalization abilities across datasets. To facilitate reproducibility and future work, we release our code use to train the models.

Colin Raffel | Sharan Narang | Adam Roberts | Katherine Lee | Noah Fiedel | Karishma Malkan | Colin Raffel | Sharan Narang | Adam Roberts | Katherine Lee | Noah Fiedel | Karishma Malkan

[1] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[2] Thomas Lukasiewicz,et al. e-SNLI: Natural Language Inference with Natural Language Explanations , 2018, NeurIPS.

[3] Motoaki Kawanabe,et al. How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[4] Mohit Bansal,et al. Adversarial NLI: A New Benchmark for Natural Language Understanding , 2020, ACL.

[5] Taku Kudo,et al. Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates , 2018, ACL.

[6] Taku Kudo,et al. SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing , 2018, EMNLP.

[7] Christopher Potts,et al. Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[8] Dumitru Erhan,et al. The (Un)reliability of saliency methods , 2017, Explainable AI.

[9] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10] Sebastian Ruder,et al. An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[11] Percy Liang,et al. Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[12] Matt Post,et al. A Call for Clarity in Reporting BLEU Scores , 2018, WMT.

[13] Jascha Sohl-Dickstein,et al. Input Switched Affine Networks: An RNN Architecture Designed for Interpretability , 2016, ICML.

[14] Colin Raffel,et al. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[15] Kevin Gimpel,et al. ALBERT: A Lite BERT for Self-supervised Learning of Language Representations , 2019, ICLR.

[16] Hector J. Levesque,et al. The Winograd Schema Challenge , 2011, AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning.

[17] Franco Turini,et al. A Survey of Methods for Explaining Black Box Models , 2018, ACM Comput. Surv..

[18] Phil Blunsom,et al. A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[19] Omer Levy,et al. SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems , 2019, NeurIPS.

[20] Noam Shazeer,et al. Adafactor: Adaptive Learning Rates with Sublinear Memory Cost , 2018, ICML.

[21] Xiaodong Liu,et al. Multi-Task Deep Neural Networks for Natural Language Understanding , 2019, ACL.

[22] Jason Eisner,et al. Modeling Annotators: A Generative Approach to Learning from Annotator Rationales , 2008, EMNLP.

[23] Dumitru Erhan,et al. Evaluating Feature Importance Estimates , 2018, ArXiv.

[24] Andrew M. Dai,et al. Music Transformer: Generating Music with Long-Term Structure , 2018, ICLR.

[25] Quanshi Zhang,et al. Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.

[26] Byron C. Wallace,et al. Attention is not Explanation , 2019, NAACL.

[27] Graham Neubig,et al. Learning to Deceive with Attention-Based Explanations , 2020, ACL.

[28] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[29] Qiang Yang,et al. An Overview of Multi-task Learning , 2018 .

[30] Rich Caruana,et al. Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[31] Yonatan Belinkov,et al. Synthetic and Natural Noise Both Break Neural Machine Translation , 2017, ICLR.

[32] Been Kim,et al. Towards A Rigorous Science of Interpretable Machine Learning , 2017, 1702.08608.

[33] Douglas Eck,et al. Music Transformer , 2018, 1809.04281.

[34] Alex Graves,et al. Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[35] Brandon M. Greenwell,et al. Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[36] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[37] Colin Raffel,et al. Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.

[38] Ashish Agarwal,et al. Hallucinations in Neural Machine Translation , 2018 .

[39] Martin Wattenberg,et al. SmoothGrad: removing noise by adding noise , 2017, ArXiv.

[40] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[41] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[42] Arnold W. M. Smeulders,et al. i-RevNet: Deep Invertible Networks , 2018, ICLR.

[43] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[44] Sebastian Ruder,et al. Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[45] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[46] Quoc V. Le,et al. Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[47] Noah A. Smith,et al. Is Attention Interpretable? , 2019, ACL.

[48] Byron C. Wallace,et al. ERASER: A Benchmark to Evaluate Rationalized NLP Models , 2020, ACL.

[49] Julian J. McAuley,et al. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering , 2016, WWW.

[50] Dan Roth,et al. Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[51] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[52] Anton van den Hengel,et al. Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[53] Ankur Taly,et al. Axiomatic Attribution for Deep Networks , 2017, ICML.

[54] Luke S. Zettlemoyer,et al. Deep Contextualized Word Representations , 2018, NAACL.

[55] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.