Congenial Benchmarking of RDF Storage Solutions

Many SPARQL benchmark generation techniques rely on SPARQL query templates or on selecting representative queries from a set of input queries by inspecting their syntactic features. Hence, prototype queries from such benchmarks mainly capture combinations of SPARQL features, but not the semantics nor the conceptual association between queries. We present congenial benchmarks---a novel type of benchmark that can detect conceptual associations and thus reflect prototypical user intentions when selecting prototype queries. We study SPARROW, an instantiation of congenial benchmarks, where the conceptual associations of SPARQL queries are measured by concept similarity measures. To this end, we transform unary acyclic conjunctive SPARQL queries into ELH-description logic concepts. Our evaluation of three popular triple stores on two datasets shows that the benchmarks generated by SPARROW differ considerably from benchmarks generated using a feature-based approach. Moreover, our evaluation suggests that SPARROW can characterize the performance of common triple stores with respect to user needs by exploiting conceptual associations to detect prototypical user needs.

[1]  Octavian Udrea,et al.  Apples and oranges: a comparison of RDF benchmarks and real RDF datasets , 2011, SIGMOD '11.

[2]  Z. Meral Özsoyoglu,et al.  RBench: Application-Specific RDF Benchmarking , 2015, SIGMOD Conference.

[3]  Jens Lehmann,et al.  Iguana: A Generic Framework for Benchmarking the Read-Write Performance of Triple Stores , 2017, SEMWEB.

[4]  Anni-Yasmin Turhan,et al.  A Framework for Semantic-based Similarity Measures for ELH-Concepts , 2012 .

[5]  Günter Ladwig,et al.  FedBench: A Benchmark Suite for Federated Semantic Data Query Processing , 2011, SEMWEB.

[6]  Elena Cabrio,et al.  Question Answering over Linked Data (QALD-5) , 2014, CLEF.

[7]  Jacques van Helden,et al.  Evaluation of clustering algorithms for protein-protein interaction networks , 2006, BMC Bioinformatics.

[8]  P. Jaccard,et al.  Etude comparative de la distribution florale dans une portion des Alpes et des Jura , 1901 .

[9]  Muhammad Saleem,et al.  A fine-grained evaluation of SPARQL endpoint federation systems , 2016, Semantic Web.

[10]  Aidan Hogan,et al.  Canonicalisation of Monotone SPARQL Queries , 2018, International Semantic Web Conference.

[11]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[12]  Jens Lehmann,et al.  RelFinder: Revealing Relationships in RDF Knowledge Bases , 2009, SAMT.

[13]  Gerd Gröner,et al.  Which of the following SPARQL Queries are Similar? Why? , 2013, LD4IE@ISWC.

[14]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[15]  Muhammad Saleem,et al.  LSQ: The Linked SPARQL Queries Dataset , 2015, SEMWEB.

[16]  Axel-Cyrille Ngonga Ngomo,et al.  LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation , 2018, J. Web Semant..

[17]  Boris Motik,et al.  HermiT: An OWL 2 Reasoner , 2014, Journal of Automated Reasoning.

[18]  Ian Horrocks,et al.  Towards Exploiting Query History for Adaptive Ontology-Based Visual Query Formulation , 2014, MTSR.

[19]  Jens Lehmann,et al.  Towards SPARQL-Based Induction for Large-Scale RDF Data Sets , 2016, ECAI.

[20]  Muhammad Saleem,et al.  FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework , 2015, SEMWEB.

[21]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..