Supporting virtual integration of Linked Data with just-in-time query recompilation

Virtual data integration takes place at query execution time and relies on transformations of the original query to many target endpoints, where the data reside. In systems that integrate many data sources, this means maintaining many mappings, queries and query templates, as well as possibly issuing separate queries for linking entities in the datasets and retrieving their data. We propose a practical approach to keeping such complexity under control, which manipulates the translation from one client query to many target queries. The method performs just-in-time recompilation of the client query into elements that are combined with a query template into the target queries for multiple sources. It was validated in a setting with a custom star-shaped query language as client API and SPARQL endpoints as sources. The approach has shown to reduce the number of target queries to issue and of query templates to maintain, using a number of compiler functions that scales with the complexity of the data source, with an overhead that may be neglected where the method is most effective.

[1]  Zoran Majkic Big Data Integration Theory: Theory and Methods of Database Mappings, Programming Languages, and Semantics , 2014 .

[2]  Maria-Esther Vidal,et al.  Benchmarking Federated SPARQL Query Engines: Are Existing Testbeds Enough? , 2012, International Semantic Web Conference.

[3]  Zoran Majkic Big Data Integration Theory , 2014, Texts in Computer Science.

[4]  Maria-Esther Vidal,et al.  Efficiently Joining Group Patterns in SPARQL Queries , 2010, ESWC.

[5]  Christian Lovis,et al.  DebugIT: Ontology-mediated Layered Data Integration for Real-time Antibiotics Resistance Surveillance , 2014, SWAT4LS.

[6]  David Maier,et al.  From databases to dataspaces: a new abstraction for information management , 2005, SGMD.

[7]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[8]  David Maier,et al.  Quarrying dataspaces: Schemaless profiling of unfamiliar information sources , 2008, 2008 IEEE 24th International Conference on Data Engineering Workshop.

[9]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[10]  Thomas Heinis,et al.  Just-In-Time Data Virtualization: Lightweight Data Management with ViDa , 2015, CIDR.

[11]  Diego Calvanese,et al.  Using OWL in Data Integration , 2009, Semantic Web Information Management.

[12]  Laura M. Haas,et al.  Just-in-time data integration in action , 2010, Proc. VLDB Endow..

[13]  Katja Hose,et al.  FedX: A Federation Layer for Distributed Query Processing on Linked Open Data , 2011, ESWC.

[14]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[15]  Marco A. Casanova,et al.  Query Processing in a Mediator Based Framework for Linked Data Integration , 2011, Int. J. Bus. Data Commun. Netw..

[16]  Norman W. Paton,et al.  Pay-as-you-go data integration for linked data: opportunities, challenges and architectures , 2012, SWIM '12.

[17]  Andrea Calì,et al.  Reasoning in Data Integration Systems: Why LAV and GAV Are Siblings , 2003, ISMIS.

[18]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[19]  Leopoldo E. Bertossi,et al.  Consistent Query Answers in Virtual Data Integration Systems , 2005, Inconsistency Tolerance.

[20]  Ladjel Bellatreche,et al.  Ontologies as a solution for simultaneously integrating and reconciliating data sources , 2012, 2012 Sixth International Conference on Research Challenges in Information Science (RCIS).

[21]  Chunxia Zhang,et al.  LAW: Link-AWare Source Selection for Virtually Integrating Linked Data , 2014, TAAI.

[22]  Hamada H. Ghenniwa,et al.  Ontology-Driven Mediated Data Integration in Open Environment , 2014, KEOD.

[23]  Muhammad Saleem,et al.  A fine-grained evaluation of SPARQL endpoint federation systems , 2016, Semantic Web.