Semantic Data Integration for Knowledge Graph Construction at Query Time

The evolution of the Web of documents into a Web of services and data has resulted in an increased availability of data from almost any domain. For example, general domain knowledge bases such as DBpedia or Wikidata, or domain specific Web sources like the Oxford Art archive, allow for accessing knowledge about a wide variety of entities including people, organizations, or art paintings. However, these data sources publish data in different ways, and they may be equipped with different search capabilities, e.g., SPARQL endpoints or REST services, thus requiring data integration techniques that provide a unified view of the published data. We devise a semantic data integration approach named FuhSen that exploits keyword and structured search capabilities of Web data sources and generates on-demand knowledge graphs merging data collected from available Web sources. Resulting knowledge graphs model semantics or meaning of merged data in terms of entities that satisfy keyword queries, and relationships among those entities. FuhSen relies on both RDF to semantically describe the collected entities, and on semantic similarity measures to decide on relatedness among entities that should be merged. We empirically evaluate the results of FuhSen data integration techniques on data from the DBpedia knowledge base. The experimental results suggest that FuhSen data integration techniques accurately integrate similar entities semantically into knowledge graphs.

[1]  Christopher Ré,et al.  Wikipedia Knowledge Graph with DeepDive , 2016, Wiki@ICWSM.

[2]  Daniela Petrelli,et al.  Hybrid Search: Effectively Combining Keywords and Semantic Searches , 2008, ESWC.

[3]  Giuseppe Pirrò,et al.  Explaining and Suggesting Relatedness in Knowledge Graphs , 2015, SEMWEB.

[4]  Martin Gaedke,et al.  Silk - A Link Discovery Framework for the Web of Data , 2009, LDOW.

[5]  Markus Huber Social Snapshot Framework: Crime Investigation on Online Social Networks , 2012, ERCIM News.

[6]  Christian Bizer,et al.  DBpedia spotlight: shedding light on the web of documents , 2011, I-Semantics '11.

[7]  York Sure-Vetter,et al.  GADES: A Graph-based Semantic Similarity Measure , 2016, SEMANTiCS.

[8]  Samantha Lam,et al.  Using the Structure of DBpedia for Exploratory Search , 2013 .

[9]  Óscar Corcho,et al.  Efficient RDF Interchange (ERI) Format for RDF Data Streams , 2014, SEMWEB.

[10]  Axel-Cyrille Ngonga Ngomo,et al.  HAWK - Hybrid Question Answering Using Linked Data , 2015, ESWC.

[11]  Maria-Esther Vidal,et al.  FuhSen: A Federated Hybrid Search Engine for Building a Knowledge Graph On-Demand (Short Paper) , 2016, OTM Conferences.

[12]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[13]  Deborah L. McGuinness,et al.  Tracking RDF Graph Provenance using RDF Molecules , 2005 .

[14]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[15]  Wei Zhang,et al.  Knowledge vault: a web-scale approach to probabilistic knowledge fusion , 2014, KDD.

[16]  Khushbu Agarwal,et al.  NOUS: Construction and Querying of Dynamic Knowledge Graphs , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[17]  Marcelo Arenas,et al.  Foundations of RDF Databases , 2008, Reasoning Web.

[18]  Robert Isele,et al.  LDIF - A Framework for Large-Scale Linked Data Integration , 2012 .

[19]  Craig A. Knoblock,et al.  Using a Knowledge Graph to Combat Human Trafficking , 2015, SEMWEB.

[20]  Christian Eitzinger,et al.  Triangular Norms , 2001, Künstliche Intell..

[21]  H. Kuhn The Hungarian method for the assignment problem , 1955 .

[22]  Andreas Thor,et al.  Finding Cross Genome Patterns in Annotation Graphs , 2012, DILS.

[23]  Amit P. Sheth,et al.  Gleaning Types for Literals in RDF Triples with Application to Entity Summarization , 2016, ESWC.