QASC: A Dataset for Question Answering via Sentence Composition

Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition(QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.

[1]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[2]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[3]  Bhavana Dalvi,et al.  Tracking State Changes in Procedural Text: a Challenge Dataset and Models for Process Paragraph Comprehension , 2018, NAACL.

[4]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[5]  Oren Etzioni,et al.  Question Answering via Integer Programming over Semi-Structured Knowledge , 2016, IJCAI.

[6]  Zhen-Hua Ling,et al.  Enhanced LSTM for Natural Language Inference , 2016, ACL.

[7]  Oren Etzioni,et al.  Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions , 2016, AAAI.

[8]  Tushar Khot,et al.  What’s Missing: A Knowledge Gap Guided Approach for Multi-hop Question Answering , 2019, EMNLP.

[9]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[10]  Peter Jansen,et al.  Multi-hop Inference for Sentence-level TextGraphs: How Challenging is Meaningfully Combining Information for Science Question Answering? , 2018, TextGraphs@NAACL-HLT.

[11]  Peter Jansen,et al.  Framing QA as Building and Ranking Intersentence Answer Justifications , 2017, CL.

[12]  Heng Ji,et al.  Improving Question Answering with External Knowledge , 2019, EMNLP.

[13]  Clayton T. Morrison,et al.  WorldTree: A Corpus of Explanation Graphs for Elementary Science Questions supporting Multi-hop Inference , 2018, LREC.

[14]  Claire Cardie,et al.  Improving Machine Reading Comprehension with General Reading Strategies , 2018, NAACL.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Oren Etzioni,et al.  Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge , 2018, ArXiv.

[17]  Peter Clark,et al.  Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering , 2018, EMNLP.

[18]  Ali Farhadi,et al.  From Recognition to Cognition: Visual Commonsense Reasoning , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Mihai Surdeanu,et al.  Higher-order Lexical Semantic Models for Non-factoid Answer Reranking , 2015, TACL.

[21]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[22]  Dan Roth,et al.  Learning What is Essential in Questions , 2017, CoNLL.

[23]  Peter Jansen,et al.  What’s in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams , 2016, COLING.

[24]  Dan Roth,et al.  On the Capabilities and Limitations of Reasoning for Natural Language Understanding , 2019, ArXiv.

[25]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[26]  Ali Farhadi,et al.  HellaSwag: Can a Machine Really Finish Your Sentence? , 2019, ACL.

[27]  Dan Roth,et al.  Looking Beyond the Surface: A Challenge Set for Reading Comprehension over Multiple Sentences , 2018, NAACL.

[28]  Gabriel Stanovsky,et al.  DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs , 2019, NAACL.

[29]  Dan Roth,et al.  Question Answering as Global Reasoning Over Semantic Abstractions , 2018, AAAI.

[30]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[31]  Guokun Lai,et al.  RACE: Large-scale ReAding Comprehension Dataset From Examinations , 2017, EMNLP.

[32]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[33]  Yejin Choi,et al.  SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference , 2018, EMNLP.

[34]  Niranjan Balasubramanian,et al.  Automatic Construction of Inference-Supporting Knowledge Bases , 2014 .

[35]  Peter Clark,et al.  Answering Complex Questions Using Open Information Extraction , 2017, ACL.

[36]  Rajarshi Das,et al.  A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset , 2018, QA@ACL.