Question Answering from Unstructured Text by Retrieval and Comprehension

Open domain Question Answering (QA) systems must interact with external knowledge sources, such as web pages, to find relevant information. Information sources like Wikipedia, however, are not well structured and difficult to utilize in comparison with Knowledge Bases (KBs). In this work we present a two-step approach to question answering from unstructured text, consisting of a retrieval step and a comprehension step. For comprehension, we present an RNN based attention model with a novel mixture mechanism for selecting answers from either retrieved articles or a fixed vocabulary. For retrieval we introduce a hand-crafted model and a neural model for ranking relevant articles. We achieve state-of-the-art performance on W IKI M OVIES dataset, reducing the error by 40%. Our experimental results further demonstrate the importance of each of the introduced components.

[1]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[2]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[3]  Pascal Vincent,et al.  Hierarchical Memory Networks , 2016, ArXiv.

[4]  Ellen M. Voorhees,et al.  Building a question answering test collection , 2000, SIGIR '00.

[5]  Rudolf Kadlec,et al.  Text Understanding with the Attention Sum Reader Network , 2016, ACL.

[6]  Yoshua Bengio,et al.  A Neural Knowledge Language Model , 2016, ArXiv.

[7]  Rahul Gupta,et al.  Knowledge base completion via search-based question answering , 2014, WWW.

[8]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[9]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[10]  Danqi Chen,et al.  A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[11]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[12]  Richard Socher,et al.  Dynamic Coattention Networks For Question Answering , 2016, ICLR.

[13]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[14]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[15]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[16]  Pascal Vincent,et al.  Clustering is Efficient for Approximate Maximum Inner Product Search , 2015, ArXiv.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Eunsol Choi,et al.  Hierarchical Question Answering for Long Documents , 2016, ArXiv.

[19]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[20]  Ruslan Salakhutdinov,et al.  Gated-Attention Readers for Text Comprehension , 2016, ACL.

[21]  David A. McAllester,et al.  Emergent Logical Structure in Vector Representations of Neural Readers , 2016, ArXiv.

[22]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.