FedBench: A Benchmark Suite for Federated Semantic Data Query Processing

In this paper we present FedBench, a comprehensive benchmark suite for testing and analyzing the performance of federated query processing strategies on semantic data. The major challenge lies in the heterogeneity of semantic data use cases, where applications may face different settings at both the data and query level, such as varying data access interfaces, incomplete knowledge about data sources, availability of different statistics, and varying degrees of query expressiveness. Accounting for this heterogeneity, we present a highly flexible benchmark suite, which can be customized to accommodate a variety of use cases and compare competing approaches. We discuss design decisions, highlight the flexibility in customization, and elaborate on the choice of data and query sets. The practicability of our benchmark is demonstrated by a rigorous evaluation of various application scenarios, where we indicate both the benefits as well as limitations of the state-of-the-art federated query processing strategies for semantic data.

[1]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[2]  Jürgen Umbrich,et al.  Comparing data summaries for processing live queries over Linked Data , 2011, World Wide Web.

[3]  Vassilis Christophides,et al.  Benchmarking RDF Schemas for the Semantic Web , 2002, SEMWEB.

[4]  Jim Gray Database and Transaction Processing Performance Handbook , 1993, The Benchmark Handbook.

[5]  Claudio Gutiérrez,et al.  The Expressive Power of SPARQL , 2008, SEMWEB.

[6]  Günter Ladwig,et al.  The Information Workbench. Interacting with the Web of Data , 2010, FIS 2010.

[7]  James A. Hendler,et al.  The Semantic Web — ISWC 2002 , 2002, Lecture Notes in Computer Science.

[8]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[9]  Jeff Z. Pan,et al.  The Semanic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Proceedings, Part II , 2011, ESWC.

[10]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[11]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[12]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[13]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[14]  Steffen Staab,et al.  The Semantic Web - ISWC 2008, 7th International Semantic Web Conference, ISWC 2008, Karlsruhe, Germany, October 26-30, 2008. Proceedings , 2008, SEMWEB.

[15]  Steffen Staab,et al.  Federated Data Management and Query Optimization for Linked Open Data , 2011, New Directions in Web Data Management 1.

[16]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[17]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.

[18]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[19]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[20]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[21]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[22]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[23]  Marcelo Arenas,et al.  Semantics and Complexity of SPARQL , 2006, International Semantic Web Conference.

[24]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[25]  Peter Haase,et al.  An evaluation of approaches to federated query processing over linked data , 2010, I-SEMANTICS '10.

[26]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[27]  Günter Ladwig,et al.  SIHJoin: Querying Remote and Local Linked Data , 2011, ESWC.

[28]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[29]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[30]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[31]  Ian Horrocks,et al.  The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I , 2010, SEMWEB.

[32]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.