Knowledge base completion via search-based question answering

Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa's mother, we could ask the query `who is the mother of Frank Zappa'. However, this is likely to return `The Mothers of Invention', which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa's place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.

[1]  John Dunnion,et al.  UCD IIRG at TAC 2010 KBP Slot Filling Task , 2010, TAC.

[2]  van Gerardus Noord,et al.  Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010) , 2010 .

[3]  Estevam R. Hruschka,et al.  Conversing Learning: Active Learning and Active Social Interaction for Human Supervision in Never-Ending Learning Systems , 2012, IBERAMIA.

[4]  Andrew McCallum,et al.  Selecting actions for resource-bounded information extraction using reinforcement learning , 2012, WSDM '12.

[5]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[6]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[7]  Charles L. A. Clarke,et al.  The effect of document retrieval quality on factoid question answering performance , 2004, Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

[8]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[9]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[10]  Nobuhiro Kaji,et al.  Identifying Constant and Unique Relations by using Time-Series Text , 2012, EMNLP-CoNLL.

[11]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[12]  Manuela M. Veloso,et al.  OpenEval: Web Information Query Evaluation , 2013, AAAI.

[13]  Heng Ji,et al.  Knowledge Base Population: Successful Approaches and Challenges , 2011, ACL.

[14]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[15]  Jun Zhao,et al.  Collective entity linking in web text: a graph-based method , 2011, SIGIR.

[16]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[17]  Marius Paca Open-Domain Question Answering from Large Text Collections , 2003, Computational Linguistics.

[18]  Ralph Grishman,et al.  Distant Supervision for Relation Extraction with an Incomplete Knowledge Base , 2013, NAACL.

[19]  Gerhard Weikum,et al.  From information to knowledge: harvesting entities and relationships from web sources , 2010, PODS '10.

[20]  Heng Ji,et al.  Overview of the TAC 2010 Knowledge Base Population Track , 2010 .

[21]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[22]  Mark Dredze,et al.  Entity Disambiguation for Knowledge Base Population , 2010, COLING.