论文信息 - Explanations for CommonsenseQA: New Dataset and Models - 字舞流文

Explanations for CommonsenseQA: New Dataset and Models

CommonsenseQA (CQA) (Talmor et al., 2019) dataset was recently released to advance the research on common-sense question answering (QA) task. Whereas the prior work has mostly focused on proposing QA models for this dataset, our aim is to retrieve as well as generate explanation for a given (question, correct answer choice, incorrect answer choices) tuple from this dataset. Our explanation definition is based on certain desiderata, and translates an explanation into a set of positive and negative common-sense properties (aka facts) which not only explain the correct answer choice but also refute the incorrect ones. We human-annotate a first-ofits-kind dataset (called ECQA) of positive and negative properties, as well as free-flow explanations, for 11K QA pairs taken from the CQA dataset. We propose a latent representation based property retrieval model as well as a GPT-2 based property generation model with a novel two step fine-tuning procedure. We also propose a free-flow explanation generation model. Extensive experiments show that our retrieval model beats BM25 baseline by a relative gain of 100% in F1 score, property generation model achieves a respectable F1 score of 36.4, and free-flow generation model achieves a similarity score of 61.9, where last two scores are based on a human correlated semantic similarity metric.

Dinesh Garg | Dinesh Khandelwal | Parag Singla | Vishwajeet Agrawal | Shourya Aggarwal | Divyanshu Mandowara | Parag Singla | Dinesh Garg | Shourya Aggarwal | Dinesh Khandelwal | Vishwajeet Agrawal | Divyanshu Mandowara

[1] Eric Nyberg,et al. Towards Generalizable Neuro-Symbolic Systems for Commonsense Question Answering , 2019, EMNLP.

[2] Quoc V. Le,et al. Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension , 2020, ICLR.

[3] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4] Peter Clark,et al. Transformers as Soft Reasoners over Language , 2020, ArXiv.

[5] Yejin Choi,et al. MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms , 2019, NAACL.

[6] Eneko Agirre,et al. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[7] Richard Socher,et al. Explain Yourself! Leveraging Language Models for Commonsense Reasoning , 2019, ACL.

[8] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[9] Nelson F. Liu,et al. Crowdsourcing Multiple Choice Science Questions , 2017, NUT@EMNLP.

[10] A. Feinstein,et al. High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[11] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[12] Harold W. Kuhn,et al. The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[13] Matthew Henderson,et al. Efficient Natural Language Response Suggestion for Smart Reply , 2017, ArXiv.

[14] Ronan Le Bras,et al. Generative Data Augmentation for Commonsense Reasoning , 2020, EMNLP 2020.

[15] Keh-Yih Su,et al. A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers , 2020, ACL.

[16] Shalini Ghosh,et al. Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention , 2019, ArXiv.

[17] Jonathan Berant,et al. CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge , 2019, NAACL.

[18] Andrew Chou,et al. Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[19] Hannaneh Hajishirzi,et al. UnifiedQA: Crossing Format Boundaries With a Single QA System , 2020, FINDINGS.

[20] Hai Zhao,et al. Retrospective Reader for Machine Reading Comprehension , 2020, AAAI.

[21] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[23] Doug Downey,et al. Abductive Commonsense Reasoning , 2019, ICLR.

[24] Basura Fernando,et al. SPICE: Semantic Propositional Image Caption Evaluation , 2016, ECCV.

[25] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[26] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[27] Helmut Horacek,et al. Requirements for Conceptual Representations of Explanations and How Reasoning Systems Can Serve Them , 2017 .

[28] Francesca Toni,et al. Explainable Automated Fact-Checking for Public Health Claims , 2020, EMNLP.

[29] Mihai Surdeanu,et al. Unsupervised Alignment-based Iterative Evidence Retrieval for Multi-hop Question Answering , 2020, ACL.

[30] Hugo Zaragoza,et al. The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[31] Jens Lehmann,et al. LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs , 2017, SEMWEB.

[32] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[34] Peter Clark,et al. Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[35] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[36] Tushar Khot,et al. QASC: A Dataset for Question Answering via Sentence Composition , 2020, AAAI.

[37] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .

[38] Elena Cabrio,et al. Question Answering over Linked Data (QALD-5) , 2014, CLEF.

[39] Yue Zhang,et al. Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation , 2019, ACL.

[40] L. Venkata Subramaniam,et al. Translucent Answer Predictions in Multi-Hop Reading Comprehension , 2020, AAAI.

[41] Peter A. Jansen,et al. WorldTree V2: A Corpus of Science-Domain Structured Explanations and Inference Patterns supporting Multi-Hop Inference , 2020, LREC.

[42] Clayton T. Morrison,et al. WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference , 2018, LREC.

[43] Wang Ling,et al. Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems , 2017, ACL.

[44] Miguel Angel Ríos Gaona,et al. Methods for measuring semantic similarity of texts , 2014 .

[45] Yu Cheng,et al. FreeLB: Enhanced Adversarial Training for Natural Language Understanding , 2020, ICLR.