Usage-Centric Benchmarking of RDF Triple Stores

A central component in many applications is the underlying data management layer. In Data-Web applications, the central component of this layer is the triple store. It is thus evident that finding the most adequate store for the application to develop is of crucial importance for individual projects as well as for data integration on the Data Web in general. In this paper, we propose a generic benchmark creation procedure for SPARQL, which we apply to the DBpedia knowledge base. In contrast to previous approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. In addition, our approach does not only take the query string but also the features of the queries into consideration during the benchmark generation process. Our generic procedure for benchmark creation is based on query-log mining, SPARQL feature analysis and clustering. After presenting the method underlying our benchmark generation algorithm, we use the generated benchmark to compare the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM.

[1]  Axel-Cyrille Ngonga Ngomo,et al.  A time-efficient hybrid approach to link discovery , 2011, OM.

[2]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[3]  Jim Gray,et al.  The Benchmark Handbook for Database and Transaction Systems , 1993 .

[4]  E. Prud hommeaux,et al.  SPARQL query language for RDF , 2011 .

[5]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[6]  Jens Lehmann,et al.  LinkedGeoData: Adding a Spatial Dimension to the Web of Data , 2009, SEMWEB.

[7]  Wolfgang Nejdl,et al.  Benchmarking Fulltext Search Performance of RDF Stores , 2009, ESWC.

[8]  Andy Seaborne,et al.  Clustered TDB: A Clustered Triple Store for Jena , 2008 .

[9]  Frank van Harmelen,et al.  Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema , 2002, SEMWEB.

[10]  Nicole Tourigny,et al.  Bio2RDF: Towards a mashup to build bioinformatics knowledge systems , 2008, J. Biomed. Informatics.

[11]  Jens Lehmann,et al.  DBpedia - A crystallization point for the Web of Data , 2009, J. Web Semant..

[12]  Axel-Cyrille Ngonga Ngomo,et al.  BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing , 2009, CICLing.

[13]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[14]  Jeremy J. Carroll,et al.  Resource description framework (rdf) concepts and abstract syntax , 2003 .

[15]  Sören Auer,et al.  LIMES - A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data , 2011, IJCAI.

[16]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[17]  Jens Lehmann,et al.  LinkedGeoData: A core for a web of spatial open data , 2012, Semantic Web.

[18]  Orri Erling,et al.  RDF Support in the Virtuoso DBMS , 2007, CSSW.

[19]  mc schraefel,et al.  Effective Benchmarking for RDF Stores Using Synthetic Data , 2008 .

[20]  Jens Lehmann,et al.  DBpedia and the live extraction of structured data from Wikipedia , 2012, Program.

[21]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[22]  Barry Bishop,et al.  OWLIM: A family of scalable semantic repositories , 2011, Semantic Web.

[23]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..