An evaluation of approaches to federated query processing over linked data

The Web has evolved from a global information space of linked documents to a web of linked data. The Web of Data enables answering complex, structured queries that could not be answered by a single data source alone. While the current procedure to work with multiple, distributed linked data sources is to load the desired data into a single RDF store and process queries in a centralized way against the merged data set, such an approach may not always be practically feasible or desired. In this paper, we analyze alternative approaches to federated query processing over linked data and how different design alternatives affect the performance and practicality of query processing. To this end, we define a benchmark for federated query processing, comprising a selection of data sources in various domains and representative queries. Using the benchmark, we perform experiments with different federation alternatives and provide insights about their advantages and disadvantages.

[1]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[2]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[3]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[4]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[5]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[6]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[7]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[8]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[9]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[10]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[11]  Philipp Obermeier,et al.  A Cost Model for Querying Distributed RDF-Repositories with SPARQL , 2008 .

[12]  Jörg Hoffmann,et al.  The Semantic Web: Research and Applications, 5th European Semantic Web Conference, ESWC 2008, Tenerife, Canary Islands, Spain, June 1-5, 2008, Proceedings , 2008, ESWC.

[13]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[14]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[15]  Jim Gray Database and Transaction Processing Performance Handbook , 1993, The Benchmark Handbook.