Unified Open-Domain Question Answering with Structured and Unstructured Knowledge

We study open-domain question answering (ODQA) with structured, unstructured and semi-structured knowledge sources, including text, tables, lists, and knowledge bases. Our approach homogenizes all sources by reducing them to text, and applies recent, powerful retriever-reader models which have so far been limited to text sources only. We show that knowledge-base QA can be greatly improved when reformulated in this way. Contrary to previous work, we find that combining sources always helps, even for datasets which target a single source by construction. As a result, our unified model produces state-of-the-art results on 3 popular ODQA benchmarks.

[1]  Ming-Wei Chang,et al.  Latent Retrieval for Weakly Supervised Open Domain Question Answering , 2019, ACL.

[2]  Feng Xia,et al.  Introduction to , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[3]  Wen-tau Yih,et al.  RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering , 2020, ArXiv.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[6]  Ming-Wei Chang,et al.  Natural Questions: A Benchmark for Question Answering Research , 2019, TACL.

[7]  William W. Cohen,et al.  PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text , 2019, EMNLP.

[8]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[9]  Ye Li,et al.  Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval , 2020, ArXiv.

[10]  Gerhard Weikum,et al.  Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs , 2019, SIGIR.

[11]  Ruslan Salakhutdinov,et al.  Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text , 2018, EMNLP.

[12]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[13]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[14]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[15]  Wenhan Xiong,et al.  Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieval , 2020, ICLR.

[16]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[17]  Hao Ma,et al.  Table Cell Search for Question Answering , 2016, WWW.

[18]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[19]  Ming-Wei Chang,et al.  The Value of Semantic Parse Labeling for Knowledge Base Question Answering , 2016, ACL.

[20]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[21]  Jimmy J. Lin,et al.  End-to-End Open-Domain Question Answering with BERTserini , 2019, NAACL.

[22]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[23]  Wenhu Chen,et al.  HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data , 2020, EMNLP.

[24]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[25]  Jian Sun,et al.  A Survey on Complex Question Answering over Knowledge Base: Recent Advances and Challenges , 2020, ArXiv.

[26]  Andrew McCallum,et al.  Relation Extraction with Matrix Factorization and Universal Schemas , 2013, NAACL.

[27]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[28]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[29]  Dongyan Zhao,et al.  Hybrid Question Answering over Knowledge Base and Free Text , 2016, COLING.

[30]  Graham Neubig,et al.  TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data , 2020, ACL.

[31]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[32]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[33]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[34]  Sarthak Jain,et al.  Question Answering over Knowledge Base using Factual Memory Networks , 2016, NAACL.

[35]  Jonathan Berant,et al.  MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension , 2019, ACL.

[36]  Noah Constant,et al.  MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering Models , 2020, ADAPTNLP.

[37]  Wenhu Chen,et al.  Open Question Answering over Tables and Text , 2020, ArXiv.

[38]  Wen-tau Yih,et al.  Efficient One-Pass End-to-End Entity Linking for Questions , 2020, EMNLP.

[39]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[40]  Enrico Motta,et al.  Is Question Answering fit for the Semantic Web?: A survey , 2011, Semantic Web.

[41]  Eunsol Choi,et al.  MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension , 2019, MRQA@EMNLP.

[42]  Petr Baudis,et al.  Modeling of the Question Answering Task in the YodaQA System , 2015, CLEF.

[43]  Wenhu Chen,et al.  TabFact: A Large-scale Dataset for Table-based Fact Verification , 2019, ICLR.

[44]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[45]  Thomas Pellissier Tanon,et al.  From Freebase to Wikidata: The Great Migration , 2016, WWW.