Using Semantics and Statistics to Turn Data into Knowledge

Many information extraction and knowledge base construction systems are addressing the challenge of deriving knowledge from text. A key problem in constructing these knowledge bases from sources like the web is overcoming the erroneous and incomplete information found in millions of candidate extractions. To solve this problem, we turn to semantics — using ontological constraints between candidate facts to eliminate errors. In this article, we represent the desired knowledge base as a knowledge graph and introduce the problem of knowledge graph identification, collectively resolving the entities, labels, and relations present in the knowledge graph. Knowledge graph identification requires reasoning jointly over millions of extractions simultaneously, posing a scalability challenge to many approaches. We use probabilistic soft logic (PSL), a recently-introduced statistical relational learning framework, to implement an efficient solution to knowledge graph identification and present state-of-the-art results for knowledge graph construction while performing an order of magnitude faster than competing methods.

[1]  Gerhard Weikum,et al.  From information to knowledge: harvesting entities and relationships from web sources , 2010, PODS '10.

[2]  Lise Getoor,et al.  Collective Graph Identification , 2016, ACM Trans. Knowl. Discov. Data.

[3]  Dejing Dou,et al.  Ontology-based information extraction: An introduction and a survey of current approaches , 2010, J. Inf. Sci..

[4]  K. Fernow New York , 1896, American Potato Journal.

[5]  Andrew McCallum,et al.  Introduction to Statistical Relational Learning , 2007 .

[6]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[7]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[8]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[9]  Frank van Harmelen,et al.  A reasonable Semantic Web , 2010, Semantic Web.

[10]  Christopher Ré,et al.  DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference , 2012, VLDS.

[11]  M. Lings,et al.  Articles , 1967, Soil Science Society of America Journal.

[12]  Jeffrey P. Bigham,et al.  Organizing and Searching the World Wide Web of Facts - Step One: The One-Million Fact Extraction Challenge , 2006, AAAI.

[13]  Dejing Dou,et al.  Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Dianne P. O'Leary,et al.  Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization , 2012, NIPS.

[15]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[16]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.