Analysis of Wikipedia-based Corpora for Question Answering

This paper gives comprehensive analyses of corpora based on Wikipedia for several tasks in question answering. Four recent corpora are collected,WikiQA, SelQA, SQuAD, and InfoQA, and first analyzed intrinsically by contextual similarities, question types, and answer categories. These corpora are then analyzed extrinsically by three question answering tasks, answer retrieval, selection, and triggering. An indexing-based method for the creation of a silver-standard dataset for answer retrieval using the entire Wikipedia is also presented. Our analysis shows the uniqueness of these corpora and suggests a better use of them for statistical question answering learning.

[1]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[2]  Dietrich Klakow,et al.  Exploring Correlation of Dependency Relation Paths for Answer Extraction , 2006, ACL.

[3]  Sivaji Bandyopadhyay,et al.  Dialogue based Question Answering System in Telugu , 2006 .

[4]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[5]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[6]  Lei Yu,et al.  Deep Learning for Answer Sentence Selection , 2014, ArXiv.

[7]  Oren Etzioni,et al.  Learning to Solve Arithmetic Word Problems with Verb Categorization , 2014, EMNLP.

[8]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[9]  Bowen Zhou,et al.  Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[10]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[11]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[12]  Eric P. Xing,et al.  Science Question Answering using Instructional Materials , 2016, ACL.

[13]  Jinho D. Choi,et al.  SelQA: A New Benchmark for Selection-Based Question Answering , 2016, 2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI).

[14]  Vittorio Castelli,et al.  A Joint Model for Answer Sentence Ranking and Answer Extraction , 2016, TACL.

[15]  Boris Katz,et al.  Learning to Answer Questions from Wikipedia Infoboxes , 2016, EMNLP.

[16]  Bowen Zhou,et al.  Attentive Pooling Networks , 2016, ArXiv.

[17]  Eduard H. Hovy,et al.  Tables as Semi-structured Knowledge for Question Answering , 2016, ACL.

[18]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.