Knowledge Guided Text Retrieval and Reading for Open Domain Question Answering

We introduce an approach for open-domain question answering (QA) that retrieves and reads a passage graph, where vertices are passages of text and edges represent relationships that are derived from an external knowledge base or co-occurrence in the same article. Our goals are to boost coverage by using knowledge-guided retrieval to find more relevant passages than text-matching methods, and to improve accuracy by allowing for better knowledge-guided fusion of information across related passages. Our graph retrieval method expands a set of seed keyword-retrieved passages by traversing the graph structure of the knowledge base. Our reader extends a BERT-based architecture and updates passage representations by propagating information from related passages and their relations, instead of reading each passage in isolation. Experiments on three open-domain QA datasets, WebQuestions, Natural Questions and TriviaQA, show improved performance over non-graph baselines by 2-11% absolute. Our approach also matches or exceeds the state-of-the-art in every case, without using an expensive end-to-end training regime.

[1]  R. Socher,et al.  Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[2]  R'emi Louf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[3]  Rajarshi Das,et al.  Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering , 2019, EMNLP.

[4]  Danqi Chen,et al.  A Discrete Hard EM Approach for Weakly Supervised Question Answering , 2019, EMNLP.

[5]  William Yang Wang,et al.  Simple yet Effective Bridge Reasoning for Open-Domain Multi-Hop Question Answering , 2019, EMNLP.

[6]  Ramesh Nallapati,et al.  Multi-passage BERT: A Globally Normalized BERT Model for Open-domain Question Answering , 2019, EMNLP.

[7]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[8]  Ali Farhadi,et al.  Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index , 2019, ACL.

[9]  Sameer Singh,et al.  Compositional Questions Do Not Necessitate Multi-hop Reasoning , 2019, ACL.

[10]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[11]  Chang Zhou,et al.  Cognitive Graph for Multi-Hop Reading Comprehension at Scale , 2019, ACL.

[12]  Rajarshi Das,et al.  Multi-step Retriever-Reader Interaction for Scalable Open-domain Question Answering , 2019, ICLR.

[13]  William W. Cohen,et al.  PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text , 2019, EMNLP.

[14]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[15]  Kenton Lee,et al.  A BERT Baseline for the Natural Questions , 2019, ArXiv.

[16]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[17]  Yue Zhang,et al.  Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks , 2018, ArXiv.

[18]  Ruslan Salakhutdinov,et al.  Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text , 2018, EMNLP.

[19]  Nicola De Cao,et al.  Question Answering by Reasoning Across Documents with Graph Convolutional Networks , 2018, NAACL.

[20]  Zhiyuan Liu,et al.  Denoising Distantly Supervised Open-Domain Question Answering , 2018, ACL.

[21]  Todor Mihaylov,et al.  Knowledgeable Reader: Enhancing Cloze-Style Reading Comprehension with External Commonsense Knowledge , 2018, ACL.

[22]  Wei Zhang,et al.  R3: Reinforced Ranker-Reader for Open-Domain Question Answering , 2018, AAAI.

[23]  Ankur P. Parikh,et al.  Multi-Mention Learning for Reading Comprehension with Neural Cascades , 2017, ICLR.

[24]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[25]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[26]  Sebastian Riedel,et al.  Constructing Datasets for Multi-hop Reading Comprehension Across Documents , 2017, TACL.

[27]  Chris Dyer,et al.  Dynamic Integration of Background Knowledge in Neural NLU Systems , 2017, 1706.02596.

[28]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[29]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[30]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[31]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[32]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[33]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[34]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[35]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[36]  Paolo Ferragina,et al.  Fast and Accurate Annotation of Short Texts with Wikipedia Pages , 2010, IEEE Software.

[37]  Hugo Zaragoza,et al.  The Probabilistic Relevance Framework: BM25 and Beyond , 2009, Found. Trends Inf. Retr..

[38]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[39]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[40]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[41]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.