SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data

The distributed and heterogeneous nature of Linked Open Data requires flexible and federated techniques for query evaluation. In order to evaluate current federation querying approaches a general methodology for conducting benchmarks is mandatory. In this paper, we present a classification methodology for federated SPARQL queries. This methodology can be used by developers of federated querying approaches to compose a set of test benchmarks that cover diverse characteristics of different queries and allows for comparability. We further develop a heuristic called SPLODGE for automatic generation of benchmark queries that is based on this methodology and takes into account the number of sources to be queried and several complexity parameters. We evaluate the adequacy of our methodology and the query generation strategy by applying them on the 2011 billion triple challenge data set.

[1]  Lakhmi C. Jain,et al.  New Directions in Web Data Management 1 , 2011, New Directions in Web Data Management 1.

[2]  Michael Schmidt,et al.  Foundations of SPARQL query optimization , 2008, ICDT '10.

[3]  Lora Aroyo,et al.  The Semantic Web: Research and Applications , 2009, Lecture Notes in Computer Science.

[4]  Jürgen Umbrich,et al.  Comparing data summaries for processing live queries over Linked Data , 2011, World Wide Web.

[5]  Tim Berners-Lee,et al.  Linked data , 2020, Semantic Web for the Working Ontologist.

[6]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[7]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.

[8]  Stijn Vansummeren,et al.  What are real SPARQL queries like? , 2011, SWIM '11.

[9]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[10]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[11]  Marko A. Rodriguez,et al.  A Graph Analysis of the Linked Data Cloud , 2009, ArXiv.

[12]  Steffen Staab,et al.  Federated Data Management and Query Optimization for Linked Open Data , 2011, New Directions in Web Data Management 1.

[13]  Christian Bizer,et al.  Executing SPARQL Queries over the Web of Linked Data , 2009, SEMWEB.

[14]  Heiner Stuckenschmidt,et al.  Index structures and algorithms for querying distributed RDF repositories , 2004, WWW '04.

[15]  Óscar Corcho,et al.  Semantics and Optimization of the SPARQL 1.1 Federation Extension , 2011, ESWC.

[16]  Guido Moerkotte,et al.  Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[17]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[18]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[19]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[20]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[21]  Gerhard Weikum,et al.  RDF-3X: a RISC-style engine for RDF , 2008, Proc. VLDB Endow..

[22]  Lora Aroyo,et al.  The Semantic Web - ISWC 2011 - 10th International Semantic Web Conference, Bonn, Germany, October 23-27, 2011, Proceedings, Part I , 2011, SEMWEB.

[23]  Jürgen Umbrich,et al.  Data summaries for on-demand queries over linked data , 2010, WWW '10.

[24]  Jun Zhao,et al.  Describing Linked Datasets On the Design and Usage of voiD, the "Vocabulary Of Interlinked Datasets" , 2009 .

[25]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[26]  Jeff Z. Pan,et al.  The Semanic Web: Research and Applications - 8th Extended Semantic Web Conference, ESWC 2011, Heraklion, Crete, Greece, May 29 - June 2, 2011, Proceedings, Part II , 2011, ESWC.

[27]  Steffen Staab,et al.  Networked graphs: a declarative mechanism for SPARQL rules, SPARQL views and RDF data integration on the web , 2008, WWW.

[28]  Ian Horrocks,et al.  The Semantic Web – ISWC 2010: 9th International Semantic Web Conference, ISWC 2010, Shanghai, China, November 7-11, 2010, Revised Selected Papers, Part I , 2010, SEMWEB.

[29]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[30]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[31]  Tim Berners-Lee,et al.  Linked Data - The Story So Far , 2009, Int. J. Semantic Web Inf. Syst..

[32]  Olaf Hartig,et al.  A Database Perspective on Consuming Linked Data on the Web , 2010, Datenbank-Spektrum.

[33]  Ulf Leser,et al.  Querying Distributed RDF Data Sources with SPARQL , 2008, ESWC.

[34]  Siegfried Handschuh,et al.  Learning from Linked Open Data Usage: Patterns & Metrics , 2010 .

[35]  Pablo de la Fuente,et al.  An Empirical Study of Real-World SPARQL Queries , 2011, ArXiv.

[36]  Abraham Bernstein,et al.  The Semantic Web - ISWC 2009, 8th International Semantic Web Conference, ISWC 2009, Chantilly, VA, USA, October 25-29, 2009. Proceedings , 2009, SEMWEB.

[37]  Günter Ladwig,et al.  Linked Data Query Processing Strategies , 2010, SEMWEB.