A Framework for Federated SPARQL Query Processing over Heterogeneous Linked Data Fragments

In recent years, the number of Linked Data Fragment (LDF) interfaces to query RDF data on the Web has increased. These interfaces differ in the SPARQL expressions they can evaluate and metadata they provide. Client-side query processing approaches have been proposed and are optimized to evaluate queries over individual interfaces. Moreover, federated query processing has focused on federations with a single type of LDF interface only, typically SPARQL endpoints. In this work, we address the challenges of SPARQL query processing over federations with heterogeneous LDF interfaces. To this end, we formalize the concept of federations of Linked Data Fragment services and propose a framework for federated querying over heterogeneous federations with different LDF interfaces. The framework comprises query decomposition, query planning, and physical operators adapted to the particularities of different LDF interfaces. Further, we propose an approach for each component of our framework and evaluate these approaches in an experimental study on the well-known FedBench benchmark. The results show a substantial improvement in performance that can be achieved by devising these interface-aware approaches to exploit the capabilities of heterogeneous interfaces in the federation.

[1]  Maribel Acosta,et al.  Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing Approaches , 2017, SEMWEB.

[2]  Maria-Esther Vidal,et al.  MULDER: Querying the Linked Data Web by Bridging RDF Molecule Templates , 2017, DEXA.

[3]  Olaf Hartig,et al.  FedQPL: A Language for Logical Query Plans over Heterogeneous Federations of RDF Data Sources , 2020, ArXiv.

[4]  Katja Hose,et al.  Towards Efficient Query Processing over Heterogeneous RDF Interfaces , 2018, DeSemWeb@ISWC.

[5]  Panos Kalnis,et al.  Lusail: A System for Querying Linked Data at Scale , 2017, Proc. VLDB Endow..

[6]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[7]  Fabian M. Suchanek,et al.  Anytime Large-Scale Analytics of Linked Open Data , 2019, SEMWEB.

[8]  Olaf Hartig,et al.  Bindings-Restricted Triple Pattern Fragments , 2016, OTM Conferences.

[9]  Maribel Acosta,et al.  On the Selection of SPARQL Endpoints to Efficiently Execute Federated SPARQL Queries , 2015, Trans. Large Scale Data Knowl. Centered Syst..

[10]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[11]  Maribel Acosta,et al.  Estimating Characteristic Sets for RDF Dataset Profiles Based on Sampling , 2020, ESWC.

[12]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[13]  Hala Skaf-Molli,et al.  The Odyssey Approach for Optimizing Federated SPARQL Queries , 2017, SEMWEB.

[14]  Ruben Verborgh,et al.  Triple Pattern Fragments: A low-cost knowledge graph interface for the Web , 2016, J. Web Semant..

[15]  Maribel Acosta,et al.  Networks of Linked Data Eddies: An Adaptive Web Query Processing Engine for RDF Data , 2015, SEMWEB.

[16]  Maribel Acosta,et al.  SMART-KG: Hybrid Shipping for SPARQL Querying on the Web , 2020, WWW.

[17]  Jorge Pérez,et al.  A Formal Framework for Comparing Linked Data Fragments , 2017, SEMWEB.

[18]  Ruben Verborgh,et al.  Comunica: A Modular SPARQL Query Engine for the Web , 2018, SEMWEB.

[19]  Muhammad Saleem,et al.  HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation , 2014, ESWC.

[20]  Hala Skaf-Molli,et al.  SaGe: Web Preemption for Public SPARQL Query Services , 2019, WWW.

[21]  Antonis Troumpoukis,et al.  SemaGrow: optimizing federated SPARQL queries , 2015, SEMANTiCS.

[22]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[23]  Maribel Acosta,et al.  Federated RDF Query Processing , 2019, Encyclopedia of Big Data Technologies.

[24]  Muhammad Saleem,et al.  CostFed: Cost-Based Query Optimization for SPARQL Endpoint Federation , 2018, SEMANTICS.

[25]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[26]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[27]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.