SP2Bench: A SPARQL Performance Benchmark

Recently, the SPARQL query language for RDF has reached the W3C recommendation status. In response to this emerging standard, the database community is currently exploring efficient storage techniques for RDF data and evaluation strategies for SPARQL queries. A meaningful analysis and comparison of these approaches necessitates a comprehensive and universal benchmark platform. To this end, we have developed SP^2Bench, a publicly available, language-specific SPARQL performance benchmark. SP^2Bench is settled in the DBLP scenario and comprises both a data generator for creating arbitrarily large DBLP-like documents and a set of carefully designed benchmark queries. The generated documents mirror key characteristics and social-world distributions encountered in the original DBLP data set, while the queries implement meaningful requests on top of this data, covering a variety of SPARQL operator constellations and RDF access patterns. As a proof of concept, we apply SP^2Bench to existing engines and discuss their strengths and weaknesses that follow immediately from the benchmark results.

[1]  Georg Lausen,et al.  An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario , 2008, SEMWEB.

[2]  Richard Cyganiak,et al.  A relational algebra for SPARQL , 2005 .

[3]  Abraham Bernstein,et al.  Hexastore: sextuple indexing for semantic web data management , 2008, Proc. VLDB Endow..

[4]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[5]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[6]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[7]  David J. DeWitt,et al.  The oo7 Benchmark , 1993, SIGMOD Conference.

[8]  Nicholas Gibbins,et al.  3store: Efficient Bulk RDF Storage , 2003, PSSS.

[9]  Georg Lausen,et al.  SPARQLing constraints for RDF , 2008, EDBT '08.

[10]  Dave Reynolds,et al.  SPARQL basic graph pattern optimization using selectivity estimation , 2008, WWW.

[11]  Alfred J. Lotka,et al.  The frequency distribution of scientific productivity , 1926 .

[12]  Vassilis Christophides,et al.  On Storing Voluminous RDF Descriptions: The Case of Web Portal Catalogs , 2001, WebDB.

[13]  Vassilis Christophides,et al.  Ieee Transactions on Knowledge and Data Engineering on Graph Features of Semantic Web Schemas , 2022 .

[14]  Axel Polleres,et al.  From SPARQL to rules (and back) , 2007, WWW '07.

[15]  Vassilis Christophides,et al.  Benchmarking RDF Schemas for the Semantic Web , 2002, SEMWEB.

[16]  Shiyong Lu,et al.  Semantics Preserving SPARQL-to-SQL Query Translation for Optional Graph Patterns. Technical Report T , 2006 .

[17]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[18]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[19]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[20]  Dongwon Lee,et al.  On six degrees of separation in DBLP-DB and more , 2005, SGMD.

[21]  Dan Brickley,et al.  Rdf vocabulary description language 1.0 : Rdf schema , 2004 .

[22]  Jeff Heflin,et al.  Rapid Benchmarking for Semantic Web Knowledge Base Systems , 2005, SEMWEB.

[23]  Olaf Hartig,et al.  The SPARQL Query Graph Model for Query Optimization , 2007, ESWC.

[24]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[25]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[26]  Jim Gray,et al.  The Benchmark Handbook for Database and Transaction Systems , 1993 .

[27]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[28]  Daniel J. Abadi,et al.  Using The Barton Libraries Dataset As An RDF benchmark , 2007 .

[29]  Volker Linnemann,et al.  Using an index of precomputed joins in order to speed up SPARQL processing , 2007, ICEIS.