Towards Efficient Distributed SPARQL Queries on Linked Data

The fast growth of the web of linked data raises new challenges for distributed query processing. Different from traditional federated databases, linked data sources cannot cooperate with each other. Hence, sophisticated optimization techniques are necessary for efficient query processing. Source selection and distributed join operations are key factors concerning performance of linked data query engines. In this paper, we propose identifier graph based source selection taking into account the logical relationship between triple patterns, and develop effective solutions for distributed join operations to avoid program errors and to minimize network traffic. In experiments, we demonstrate the practicability and efficiency of our approaches on a set of real-world queries and data sources from the Linked Open Data cloud. With the implemented prototype system, we achieve a significant improvement in the accuracy of source selection and query performance over state-of-the-art federated query engines.

[1]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[2]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[3]  Wolfram Wöß,et al.  A Semantic Web middleware for Virtual Data Integration on the Web , 2008, ESWC.

[4]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[5]  Simon Schenk,et al.  Optimizing SPARQL Queries over Disparate RDF Data Sources through Distributed Semi-Joins , 2008, SEMWEB.

[6]  Olaf Hartig,et al.  The SPARQL Query Graph Model for Query Optimization , 2007, ESWC.

[7]  Andriy Nikolov,et al.  FedSearch: Efficiently Combining Structured Queries and Full-Text Search in a SPARQL Federation , 2013, International Semantic Web Conference.

[8]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[9]  Ian Horrocks,et al.  The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I , 2010, SEMWEB.

[10]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[11]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[12]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[13]  Abraham Bernstein,et al.  OptARQ: A SPARQL Optimization Approach based on Triple Pattern Selectivity Estimation , 2007 .

[14]  Abdelkader Hameurlain,et al.  Performance Improving of Semi-join Based Join Operation through Algebraic Signatures , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications.

[15]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[16]  Gregory Piatetsky-Shapiro,et al.  Accurate estimation of the number of tuples satisfying a condition , 1984, SIGMOD '84.

[17]  Lora Aroyo,et al.  The Semantic Web – ISWC 2013 , 2013, Lecture Notes in Computer Science.

[18]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[19]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[20]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[21]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[22]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[23]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[24]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.