UNIQORN: Unified Question Answering over RDF Knowledge Graphs and Natural Language Text

Question answering over knowledge graphs and other RDF data has been greatly advanced, with a number of good systems providing crisp answers for natural language questions or telegraphic queries. Some of these systems incorporate textual sources as additional evidence for the answering process, but cannot compute answers that are present in text alone. Conversely, systems from the IR and NLP communities have addressed QA over text, but such systems barely utilize semantic data and knowledge. This paper presents a method for complex questions that can seamlessly operate over a mixture of RDF datasets and text corpora, or individual sources, in a unified framework. Our method, called UNIQORN, builds a context graph on-the-fly, by retrieving question-relevant evidences from the RDF data and/or a text corpus, using fine-tuned BERT models. The resulting graph is typically contains all question-relevant evidences but also a lot of noise. UNIQORN copes with this input by a graph algorithm for Group Steiner Trees, that identifies the best answer candidates in the context graph. Experimental results on several benchmarks of complex questions with multiple entities and relations, show that UNIQORN significantly outperforms state-of-the-art methods for heterogeneous QA. The graph-based methodology provides user-interpretable evidence for the complete answering process.

[1]  Mirella Lapata,et al.  Semantic Parsing for Conversational Question Answering over Knowledge Graphs , 2023, EACL.

[2]  Dennis Diefenbach,et al.  QAnswer: Towards Question Answering Search over Websites , 2022, WWW.

[3]  Rishiraj Saha Roy,et al.  Conversational Question Answering on Heterogeneous Sources , 2022, SIGIR.

[4]  Eric Nyberg,et al.  Open Domain Question Answering with A Unified Knowledge Interface , 2021, ACL.

[5]  Rishiraj Saha Roy,et al.  Beyond NED: Fast and Effective Search Space Reduction for Complex Question Answering over Knowledge Bases , 2021, WSDM.

[6]  Xilun Chen,et al.  UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering , 2020, NAACL-HLT.

[7]  Rishiraj Saha Roy,et al.  Question Answering for the Curated Web: Tasks and Methods in QA over Knowledge Bases and Text Collections , 2021, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[8]  Gerhard Weikum,et al.  Complex Temporal Question Answering on Knowledge Graphs , 2021, CIKM.

[9]  Rishiraj Saha Roy,et al.  Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs , 2021, SIGIR.

[10]  Jordan Boyd-Graber,et al.  Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval , 2021, NAACL.

[11]  Jens Lehmann,et al.  Context Transformer with Stacked Pointer Networks for Conversational Question Answering over Knowledge Graphs , 2021, ESWC.

[12]  Jun Luo,et al.  Finding Group Steiner Trees in Graphs with both Vertex and Edge Weights , 2021, Proc. VLDB Endow..

[13]  Jing Jiang,et al.  Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals , 2021, WSDM.

[14]  Edouard Grave,et al.  Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering , 2020, EACL.

[15]  Rishiraj Saha Roy,et al.  Efficient Contextualization using Top-k Operators for Question Answering over Knowledge Graphs , 2021, ArXiv.

[16]  Wen-tau Yih,et al.  Efficient One-Pass End-to-End Entity Linking for Questions , 2020, EMNLP.

[17]  Apoorv Saxena,et al.  Improving Multi-hop Question Answering over Knowledge Graphs using Knowledge Base Embeddings , 2020, ACL.

[18]  Krisztian Balog,et al.  REL: An Entity Linker Standing on the Shoulders of Giants , 2020, SIGIR.

[19]  Evgeny Kharlamov,et al.  Keyword Search over Knowledge Graphs via Static and Dynamic Hub Labelings , 2020, WWW.

[20]  Danqi Chen,et al.  Dense Passage Retrieval for Open-Domain Question Answering , 2020, EMNLP.

[21]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[22]  Graham Neubig,et al.  Differentiable Reasoning over a Virtual Knowledge Base , 2020, ICLR.

[23]  William W. Cohen,et al.  Scalable Neural Methods for Reasoning With a Symbolic Knowledge Base , 2020, ICLR.

[24]  Ben Kao,et al.  PERQ: Predicting, Explaining, and Rectifying Failed Questions in KB-QA Systems , 2020, WSDM.

[25]  Xiaolong Jin,et al.  Stepwise Reasoning for Multi-Relation Question Answering over Knowledge Graph with Weak Supervision , 2020, WSDM.

[26]  R. Socher,et al.  Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering , 2019, ICLR.

[27]  Yuji Matsumoto,et al.  Wikipedia2Vec: An Efficient Toolkit for Learning and Visualizing the Embeddings of Words and Entities from Wikipedia , 2018, EMNLP.

[28]  Pierre Maret,et al.  Towards a Question Answering System over the Semantic Web , 2018, Semantic Web.

[29]  H. V. Jagadish,et al.  Learning to Answer Complex Questions over Knowledge Bases with Query Composition , 2019, CIKM.

[30]  Jens Lehmann,et al.  LC-QuAD 2.0: A Large Dataset for Complex Question Answering over Wikidata and DBpedia , 2019, SEMWEB.

[31]  Nan Duan,et al.  Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base , 2019, EMNLP.

[32]  Gerhard Weikum,et al.  Look before you Hop: Conversational Question Answering over Knowledge Graphs Using Judicious Context Expansion , 2019, CIKM.

[33]  Yuzhong Qu,et al.  Leveraging Frequent Query Substructures to Generate Formal Queries for Complex Question Answering , 2019, EMNLP.

[34]  M. de Rijke,et al.  Message Passing for Complex Question Answering over Knowledge Graphs , 2019, CIKM.

[35]  Gerhard Weikum,et al.  Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs , 2019, SIGIR.

[36]  Ting Yao,et al.  Document Gated Reader for Open-Domain Question Answering , 2019, SIGIR.

[37]  Benjamin Roth,et al.  Interpretable Question Answering on Knowledge Bases and Text , 2019, ACL.

[38]  Yansong Feng,et al.  Enhancing Key-Value Memory Neural Networks for Knowledge Based Question Answering , 2019, NAACL.

[39]  Pierre Maret,et al.  QAnswer: A Question Answering prototype bridging the gap between a considerable part of the LOD cloud and end-users , 2019, WWW.

[40]  Wenhan Xiong,et al.  Improving Question Answering over Incomplete KBs with Knowledge-Aware Reader , 2019, ACL.

[41]  William W. Cohen,et al.  PullNet: Open Domain Question Answering with Iterative Retrieval on Knowledge Bases and Text , 2019, EMNLP.

[42]  Mohammed J. Zaki,et al.  Bidirectional Attentive Memory Networks for Question Answering over Knowledge Bases , 2019, NAACL.

[43]  Jingyuan Zhang,et al.  Knowledge Graph Embedding Based Question Answering , 2019, WSDM.

[44]  M. de Rijke,et al.  Learning to Transform, Combine, and Reason in Open-Domain Question Answering , 2019, BNAIC/BENELEARN.

[45]  Gerhard Weikum,et al.  ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters , 2018, NAACL.

[46]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[47]  Gerhard Weikum,et al.  TEQUILA: Temporal Question Answering over Knowledge Bases , 2018, CIKM.

[48]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[49]  Ruslan Salakhutdinov,et al.  Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text , 2018, EMNLP.

[50]  Mausam,et al.  Open Information Extraction from Conjunctive Sentences , 2018, COLING.

[51]  Ioana Manolescu,et al.  ConnectionLens: Finding Connections Across Heterogeneous Data Sources , 2018, Proc. VLDB Endow..

[52]  Zhiyuan Liu,et al.  Denoising Distantly Supervised Open-Domain Question Answering , 2018, ACL.

[53]  Thomas Pellissier Tanon,et al.  Demoing Platypus - A Multilingual Question Answering Platform for Wikidata , 2018, ESWC.

[54]  Lei Zou,et al.  Answering Natural Language Questions by Subgraph Matching over Knowledge Graphs , 2018, IEEE Transactions on Knowledge and Data Engineering.

[55]  Luke S. Zettlemoyer,et al.  SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach , 2018, EMNLP.

[56]  Gerhard Weikum,et al.  Never-Ending Learning for Open-Domain Question Answering over Knowledge Bases , 2018, WWW.

[57]  Jonathan Berant,et al.  The Web as a Knowledge-Base for Answering Complex Questions , 2018, NAACL.

[58]  Nan Yang,et al.  Context-Aware Answer Sentence Selection With Hierarchical Gated Recurrent Neural Networks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[59]  Alexander J. Smola,et al.  Go for a Walk and Arrive at the Answer: Reasoning Over Paths in Knowledge Bases using Reinforcement Learning , 2017, ICLR.

[60]  Christopher Clark,et al.  Simple and Effective Multi-Paragraph Reading Comprehension , 2017, ACL.

[61]  Le Song,et al.  Variational Reasoning for Question Answering with Knowledge Graph , 2017, AAAI.

[62]  Ganesh Ramakrishnan,et al.  Neural architecture for question answering using a knowledge graph and web corpus , 2017, Information Retrieval Journal.

[63]  Muhammad Saleem,et al.  9th Challenge on Question Answering over Linked Data (QALD-9) (invited paper) , 2018, Semdeep/NLIWoD@ISWC.

[64]  Lei Zou,et al.  A State-transition Framework to Answer Complex Questions over Knowledge Base , 2018, EMNLP.

[65]  Filip Radlinski,et al.  TREC Complex Answer Retrieval Overview , 2018, TREC.

[66]  Gerhard Weikum,et al.  Efficiency-aware Answering of Compositional Questions using Answer Type Prediction , 2017, IJCNLP.

[67]  Jens Lehmann,et al.  LC-QuAD: A Corpus for Complex Question Answering over Knowledge Graphs , 2017, SEMWEB.

[68]  Luciano Del Corro,et al.  MinIE: Minimizing Facts in Open Information Extraction , 2017, EMNLP.

[69]  Eunsol Choi,et al.  TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension , 2017, ACL.

[70]  Rajarshi Das,et al.  Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks , 2017, ACL.

[71]  Peter Clark,et al.  Answering Complex Questions Using Open Information Extraction , 2017, ACL.

[72]  Gerhard Weikum,et al.  Automated Template Generation for Question Answering over Knowledge Graphs , 2017, WWW.

[73]  Jason Weston,et al.  Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[74]  Jens Lehmann,et al.  Survey on challenges of Question Answering in the Semantic Web , 2017, Semantic Web.

[75]  Dániel Marx,et al.  The Complexity Landscape of Fixed-Parameter Directed Steiner Network Problems , 2017, ICALP.

[76]  Tiejun Zhao,et al.  Constraint-Based Question Answering with Knowledge Graph , 2016, COLING.

[77]  Dongyan Zhao,et al.  Hybrid Question Answering over Knowledge Base and Free Text , 2016, COLING.

[78]  Mudhakar Srivatsa,et al.  Improving Semantic Parsing via Answer Type Inference , 2016, EMNLP.

[79]  Mausam,et al.  Open Information Extraction Systems and Downstream Applications , 2016, IJCAI.

[80]  Eugene Agichtein,et al.  When a Knowledge Base Is Not Enough: Question Answering over Knowledge Bases with External Text Data , 2016, SIGIR.

[81]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[82]  Jeffrey Xu Yu,et al.  Efficient and Progressive Group Steiner Tree Search , 2016, SIGMOD Conference.

[83]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[84]  Sarthak Jain,et al.  Question Answering over Knowledge Base using Factual Memory Networks , 2016, NAACL.

[85]  Thomas Pellissier Tanon,et al.  From Freebase to Wikidata: The Great Migration , 2016, WWW.

[86]  Dongyan Zhao,et al.  Question Answering on Freebase via Relation Extraction and Textual Evidence , 2016, ACL.

[87]  Gerhard Weikum,et al.  Relationship Queries on Extended Knowledge Graphs , 2016, WSDM.

[88]  Jochen Könemann,et al.  On the equivalence of the bidirected and hypergraphic relaxations for Steiner tree , 2014, Math. Program..

[89]  Ming Zhou,et al.  Answering Questions with Complex Semantic Constraints on Open Knowledge Bases , 2015, CIKM.

[90]  Hannah Bast,et al.  More Accurate Question Answering on Freebase , 2015, CIKM.

[91]  Ming-Wei Chang,et al.  Semantic Parsing via Staged Query Graph Generation: Question Answering with Knowledge Base , 2015, ACL.

[92]  Christopher D. Manning,et al.  Leveraging Linguistic Structure For Open Domain Information Extraction , 2015, ACL.

[93]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[94]  Ming-Wei Chang,et al.  Open Domain Question Answering via Semantic Enrichment , 2015, WWW.

[95]  Petr Baudi,et al.  YodaQA: A Modular Question Answering System Pipeline , 2015 .

[96]  Donna K. Harman,et al.  Overview of the TREC 2015 LiveQA Track , 2015, TREC.

[97]  Markus Krötzsch,et al.  Reifying RDF: What Works Well With Wikidata? , 2015, SSWS@ISWC.

[98]  Mohamed Yahya,et al.  ReNoun: Fact Extraction for Nominal Attributes , 2014, EMNLP.

[99]  Mandar Joshi,et al.  Knowledge Graph and Corpus Driven Segmentation and Answer Inference for Telegraphic Entity-seeking Queries , 2014, EMNLP.

[100]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[101]  Markus Krötzsch,et al.  Wikidata , 2014, Commun. ACM.

[102]  Oren Etzioni,et al.  Open question answering over curated and extracted knowledge bases , 2014, KDD.

[103]  Gerhard Weikum,et al.  Robust question answering over the web of linked data , 2013, CIKM.

[104]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[105]  Andrew Chou,et al.  Semantic Parsing on Freebase from Question-Answer Pairs , 2013, EMNLP.

[106]  Oren Etzioni,et al.  Paraphrase-Driven Learning for Open Question Answering , 2013, ACL.

[107]  Luciano Del Corro,et al.  ClausIE: clause-based open information extraction , 2013, WWW.

[108]  David A. Ferrucci,et al.  Introduction to "This is Watson" , 2012, IBM J. Res. Dev..

[109]  Jens Lehmann,et al.  Template-based question answering over RDF data , 2012, WWW.

[110]  Slav Petrov,et al.  A Universal Part-of-Speech Tagset , 2011, LREC.

[111]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[112]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[113]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[114]  Ronan Cummins,et al.  Learning in a pairwise term-term proximity framework for information retrieval , 2009, SIGIR.

[115]  Gerhard Weikum,et al.  STAR: Steiner-Tree Approximation in Relationship Graphs , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[116]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[117]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[118]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[119]  Shan Wang,et al.  Finding Top-k Min-Cost Connected Trees in Databases , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[120]  S. Sudarshan,et al.  Bidirectional Expansion For Keyword Search on Graph Databases , 2005, VLDB.

[121]  Mark Andrew Greenwood,et al.  Open-domain question answering , 2005 .

[122]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[123]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[124]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[125]  R. Ravi,et al.  A polylogarithmic approximation algorithm for the group Steiner tree problem , 2000, SODA '98.

[126]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.