Latent Retrieval for Weakly Supervised Open Domain Question Answering

Recent work on open domain question answering (QA) assumes strong supervision of the supporting evidence and/or assumes a blackbox information retrieval (IR) system to retrieve evidence candidates. We argue that both are suboptimal, since gold evidence is not always available, and QA is fundamentally different from IR. We show for the first time that it is possible to jointly learn the retriever and reader from question-answer string pairs and without any IR system. In this setting, evidence retrieval from all of Wikipedia is treated as a latent variable. Since this is impractical to learn from scratch, we pre-train the retriever with an Inverse Cloze Task. We evaluate on open versions of five QA datasets. On datasets where the questioner already knows the answer, a traditional IR system such as BM25 is sufficient. On datasets where a user is genuinely seeking an answer, we show that learned retrieval is crucial, outperforming BM25 by up to 19 points in exact match.

[1]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[2]  Jimmy J. Lin,et al.  Anserini: Enabling the Use of Lucene for Information Retrieval Research , 2017, SIGIR.

[3]  Kyunghyun Cho,et al.  SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[4]  Kenton Lee,et al.  Learning Recurrent Span Representations for Extractive Question Answering , 2016, ArXiv.

[5]  Amit Singh Entity based Q&A Retrieval , 2012, EMNLP-CoNLL.

[6]  Jimmy J. Lin,et al.  The Neural Hype and Comparisons Against Weak Baselines , 2019, SIGIR Forum.

[7]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[8]  Rajarshi Das,et al.  Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering , 2019, ICLR.

[9]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[10]  Ming-Wei Chang,et al.  Driving Semantic Parsing from the World’s Response , 2010, CoNLL.

[11]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[12]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[13]  William W. Cohen,et al.  Quasar: Datasets for Question Answering by Search and Reading , 2017, ArXiv.

[14]  Jaewoo Kang,et al.  Ranking Paragraphs for Improving Answer Recall in Open-Domain Question Answering , 2018, EMNLP.

[15]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[16]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[17]  Stefan Feuerriegel,et al.  Adaptive Document Retrieval for Deep Question Answering , 2018, EMNLP.

[18]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[19]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[20]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[21]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[22]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[23]  Luke S. Zettlemoyer,et al.  Weakly Supervised Learning of Semantic Parsers for Mapping Instructions to Actions , 2013, TACL.

[24]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[25]  Wei Zhang,et al.  R3: Reinforced Ranker-Reader for Open-Domain Question Answering , 2018, AAAI.

[26]  Wilson L. Taylor,et al.  “Cloze Procedure”: A New Tool for Measuring Readability , 1953 .

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Regina Barzilay,et al.  Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[29]  Ellen M. Voorhees,et al.  Overview of the TREC 2004 Novelty Track. , 2005 .

[30]  Petr Baudis,et al.  Modeling of the Question Answering Task in the YodaQA System , 2015, CLEF.

[31]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[32]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[33]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.