SQuAD: 100,000+ Questions for Machine Comprehension of Text

We present the Stanford Question Answering Dataset (SQuAD), a new reading comprehension dataset consisting of 100,000+ questions posed by crowdworkers on a set of Wikipedia articles, where the answer to each question is a segment of text from the corresponding reading passage. We analyze the dataset to understand the types of reasoning required to answer the questions, leaning heavily on dependency and constituency trees. We build a strong logistic regression model, which achieves an F1 score of 51.0%, a significant improvement over a simple baseline (20%). However, human performance (86.8%) is much higher, indicating that the dataset presents a good challenge problem for future research. The dataset is freely available at this https URL

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Lynette Hirschman,et al.  Deep Read: A Reading Comprehension System , 1999, ACL.

[3]  Hwee Tou Ng,et al.  A Machine Learning Approach to Answering Questions for Reading Comprehension Tests , 2000, EMNLP.

[4]  Ellen Riloff,et al.  A Rule-based Question Answering System for Reading Comprehension Tests , 2000 .

[5]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[6]  Susan T. Dumais,et al.  An Analysis of the AskMSR Question-Answering System , 2002, EMNLP.

[7]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[8]  Dietrich Klakow,et al.  Exploring Correlation of Dependency Relation Paths for Answer Extraction , 2006, ACL.

[9]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[10]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[11]  Hong Sun,et al.  Answer Extraction from Passage Graph for Question Answering , 2013, IJCAI.

[12]  Matthew Richardson,et al.  MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text , 2013, EMNLP.

[13]  Luke S. Zettlemoyer,et al.  Learning to Automatically Solve Algebra Word Problems , 2014, ACL.

[14]  Peter Clark,et al.  Modeling Biological Processes for Reading Comprehension , 2014, EMNLP.

[15]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[16]  Michael S. Bernstein,et al.  Daemo: A Self-Governed Crowdsourcing Marketplace , 2015, UIST.

[17]  Regina Barzilay,et al.  Machine Comprehension with Discourse Relations , 2015, ACL.

[18]  David A. McAllester,et al.  Machine Comprehension with Syntax, Frames, and Semantics , 2015, ACL.

[19]  Shojiro Nishio,et al.  N-gram IDF: A Global Term Weighting Scheme Based on Information Distance , 2015, WWW.

[20]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[21]  Eric P. Xing,et al.  Learning Answer-Entailing Structures for Machine Comprehension , 2015, ACL.

[22]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[23]  Jason Weston,et al.  The Goldilocks Principle: Reading Children's Books with Explicit Memory Representations , 2015, ICLR.

[24]  Oren Etzioni,et al.  My Computer Is an Honor Student - but How Intelligent Is It? Standardized Tests as a Measure of AI , 2016, AI Mag..

[25]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[26]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[27]  Shuohang Wang,et al.  Machine Comprehension Using Match-LSTM and Answer Pointer , 2016, ICLR.