MINTE: semantically integrating RDF graphs

The nature of the RDF data model allows for numerous descriptions of the same entity. For example, different RDF vocabularies may be utilized to describe pharmacogenomic data, and the same drug or gene is represented by different RDF graphs in DBpedia or Drug-bank. To provide a unified representation of the same real-world entity, RDF graphs need to be semantically integrated. Semantic integration requires the management of knowledge encoded in RDF vocabularies to determine the relatedness of different RDF representations of the same entity, e.g., axiomatic definition of vocabulary properties or resource equivalences. We devise MINTE, an integration technique that relies on both: knowledge stated in RDF vocabularies and semantic similarity measures to merge semantically equivalent RDF graphs, i.e., graphs corresponding to the same real-world entity. MINTE follows a two-fold approach to solve the problem of integrating RDF graphs. In the first step, MINTE implements a 1--1 weighted perfect matching algorithm to identify semantically equivalent RDF entities in different graphs. Then, MINTE relies on different fusion policies to merge triples from these semantically equivalent RDF entities. We empirically evaluate the performance of MINTE on data from DBpedia, Wiki-data, and Drugbank. The experimental results suggest that MINTE is able to accurately integrate semantically equivalent RDF graphs.

[1]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[2]  Tomás Knap,et al.  Linked Data Fusion in ODCleanStore , 2012, International Semantic Web Conference.

[3]  Kristina Lerman,et al.  Semi-automatically Mapping Structured Sources into the Semantic Web , 2012, ESWC.

[4]  Robert Isele,et al.  LDIF - A Framework for Large-Scale Linked Data Integration , 2012 .

[5]  Günter Ladwig,et al.  SIHJoin: Querying Remote and Local Linked Data , 2011, ESWC.

[6]  Robert Isele,et al.  Active learning of expressive linkage rules using genetic programming , 2013, J. Web Semant..

[7]  Martin Necaský,et al.  Linked Data Integration with Conflicts , 2014, ArXiv.

[8]  York Sure-Vetter,et al.  GADES: A Graph-based Semantic Similarity Measure , 2016, SEMANTiCS.

[9]  Craig A. Knoblock,et al.  Efficient Graph-Based Document Similarity , 2016, ESWC.

[10]  J. Munkres ALGORITHMS FOR THE ASSIGNMENT AND TRANSIORTATION tROBLEMS* , 1957 .

[11]  Christian Bizer,et al.  Sieve: linked data quality assessment and fusion , 2012, EDBT-ICDT '12.

[12]  Ian Horrocks,et al.  Description logic programs: combining logic programs with description logic , 2003, WWW '03.

[13]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[14]  Maria-Esther Vidal,et al.  FuhSen: A Federated Hybrid Search Engine for Building a Knowledge Graph On-Demand (Short Paper) , 2016, OTM Conferences.

[15]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[16]  Tomás Knap,et al.  UnifiedViews: An ETL Framework for Sustainable RDF Data Processing , 2014, ESWC.

[17]  Óscar Corcho,et al.  Efficient RDF Interchange (ERI) Format for RDF Data Streams , 2014, SEMWEB.

[18]  Sebastian Rudolph,et al.  Foundations of Semantic Web Technologies , 2009 .

[19]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.