论文信息 - Retrieve, Rerank, Read, then Iterate: Answering Open-Domain Questions of Arbitrary Complexity from Text - 字舞流文

Retrieve, Rerank, Read, then Iterate: Answering Open-Domain Questions of Arbitrary Complexity from Text

Current approaches to open-domain question answering often make crucial assumptions that prevent them from generalizing to real-world settings, including the access to parameterized retrieval systems well-tuned for the task, access to structured metadata like knowledge bases and web links, or a priori knowledge of the complexity of questions to be answered (e.g., single-hop or multi-hop). To address these limitations, we propose a unified system to answer open-domain questions of arbitrary complexity directly from text that works with off-the-shelf retrieval systems on arbitrary text collections. We employ a single multi-task model to perform all the necessary subtasks---retrieving supporting facts, reranking them, and predicting the answer from all retrieved documents---in an iterative fashion. To emulate a more realistic setting, we also constructed a new unified benchmark by collecting about 200 multi-hop questions that require three Wikipedia pages to answer, and combining them with existing datasets. We show that our model not only outperforms state-of-the-art systems on several existing benchmarks that exclusively feature single-hop or multi-hop open-domain questions, but also achieves strong performance on the new benchmark.

Haejun Lee | Christopher D. Manning | Peng Qi | OghenetegiriTGSido | Peng Qi | Haejun Lee

[1] Ramesh Nallapati,et al. Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering , 2019, EMNLP.

[2] Yoshua Bengio,et al. On Using Very Large Target Vocabulary for Neural Machine Translation , 2014, ACL.

[3] Zhen Huang,et al. Retrieve, Read, Rerank: Towards End-to-End Multi-Document Reading Comprehension , 2019, ACL.

[4] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5] Jian Zhang,et al. SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[6] Wenhan Xiong,et al. Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval , 2020, ICLR.

[7] Stephen E. Robertson,et al. Okapi at TREC-3 , 1994, TREC.

[8] Ming-Wei Chang,et al. Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[9] Stephen E. Robertson,et al. GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[10] Minlie Huang,et al. Knowledge-Aided Open-Domain Question Answering , 2020, ArXiv.

[11] Jonathan Berant,et al. The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[12] Wei Zhang,et al. Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering , 2017, ICLR.

[13] Kyunghyun Cho,et al. SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine , 2017, ArXiv.

[14] Chen Zhao,et al. Complex Factoid Question Answering with a Free-Text Knowledge Graph , 2020, WWW.

[15] Ming-Wei Chang,et al. REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[16] Sebastian Riedel,et al. Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[17] Zijian Wang,et al. Answering Complex Open-domain Questions Through Iterative Query Generation , 2019, EMNLP.

[18] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[19] ChengXiang Zhai,et al. When documents are very long, BM25 fails! , 2011, SIGIR.

[20] Philip Bachman,et al. NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[21] Jimmy J. Lin,et al. End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[22] Rajarshi Das,et al. Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering , 2019, ICLR.

[23] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24] Danqi Chen,et al. Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[25] Paul N. Bennett,et al. Transformer-XH: Multi-Evidence Reasoning with eXtra Hop Attention , 2020, ICLR.

[26] Yoshua Bengio,et al. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[27] Wei Zhang,et al. R3: Reinforced Reader-Ranker for Open-Domain Question Answering , 2017, ArXiv.

[28] Quoc V. Le,et al. ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators , 2020, ICLR.

[29] Percy Liang,et al. Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[30] Richard Socher,et al. Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[31] Ran El-Yaniv,et al. Multi-Hop Paragraph Retrieval for Open-Domain Question Answering , 2019, ACL.

[32] Eunsol Choi,et al. TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[33] Christopher Clark,et al. Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[34] Ali Farhadi,et al. Bidirectional Attention Flow for Machine Comprehension , 2016, ICLR.

[35] Mohit Bansal,et al. Revealing the Importance of Semantic Retrieval for Machine Reading at Scale , 2019, EMNLP.

[36] Christopher Potts,et al. Relevance-guided Supervision for OpenQA with ColBERT , 2020, ArXiv.

[37] Koray Kavukcuoglu,et al. Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[38] Graham Neubig,et al. Differentiable Reasoning over a Virtual Knowledge Base , 2020, ICLR.

[39] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[40] Ming-Wei Chang,et al. Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[41] Edouard Grave,et al. Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.