Quasar: Datasets for Question Answering by Search and Reading

We present two new large-scale datasets aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. The Quasar-S dataset consists of 37000 cloze-style (fill-in-the-gap) queries constructed from definitions of software entity tags on the popular website Stack Overflow. The posts and comments on the website serve as the background corpus for answering the cloze questions. The Quasar-T dataset consists of 43000 open-domain trivia questions and their answers obtained from various internet sources. ClueWeb09 serves as the background corpus for extracting these answers. We pose these datasets as a challenge for two related subtasks of factoid Question Answering: (1) searching for relevant pieces of text that include the correct answer to a query, and (2) reading the retrieved text to answer the query. We also describe a retrieval system for extracting relevant sentences and documents from the corpus given a query, and include these in the release for researchers wishing to only focus on (2). We evaluate several baselines on both datasets, ranging from simple heuristics to powerful neural models, and show that these lag behind human performance by 16.4% and 32.1% for Quasar-S and -T respectively. The datasets are available at this https URL .

[1]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[2]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[3]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[4]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[5]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.

[6]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[7]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[8]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[9]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[10]  David A. McAllester,et al.  Who did What: A Large-Scale Person-Centered Cloze Dataset , 2016, EMNLP.

[11]  Jianfeng Gao,et al.  A Human Generated MAchine Reading COmprehension Dataset , 2018 .

[12]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[13]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[14]  Ruslan Salakhutdinov,et al.  Gated-Attention Readers for Text Comprehension , 2016, ACL.

[15]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[16]  Ali Farhadi,et al.  Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[17]  Kyunghyun Cho,et al.  SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[18]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[19]  Ruslan Salakhutdinov,et al.  Question Answering from Unstructured Text by Retrieval and Comprehension , 2017, ArXiv.

[20]  Dirk Weissenborn,et al.  FastQA: A Simple and Efficient Neural Architecture for Question Answering , 2017, ArXiv.

[21]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.