A Multi-Criteria Experimental Ranking of Distributed SPARQL Evaluators

SPARQL is the standard language for querying RDF data. There exists a variety of SPARQL query evaluation systems implementing different architectures for the distribution of data and computations. Differences in architectures coupled with specific optimizations, for e.g. preprocessing and indexing, make these systems incomparable from a purely theoretical perspective. This results in many implementations solving the SPARQL query evaluation problem while exhibiting very different behaviors, not all of them being adapted in any context. We provide a new perspective on distributed SPARQL evaluators, based on multi-criteria experimental rankings. Our suggested set of 5 features (namely velocity, immediacy, dynamicity, parsimony, and resiliency) provides a more comprehensive description of the behaviors of distributed evaluators when compared to traditional runtime performance metrics. We show how these features help in more accurately evaluating to which extent a given system is appropriate for a given use case. For this purpose, we systematically benchmarked a panel of 10 state-of-the-art implementations. We ranked them using a reading grid that helps in pinpointing the advantages and limitations of current technologies for the distributed evaluation of SPARQL queries.

[1]  N. Shadbolt,et al.  4store: The Design and Implementation of a Clustered RDF Store , 2009 .

[2]  Z. Meral Özsoyoglu,et al.  RBench: Application-Specific RDF Benchmarking , 2015, SIGMOD Conference.

[3]  Andreas Harth,et al.  CumulusRDF: Linked Data Management on Nested Key-Value Stores , 2011 .

[4]  Guillaume Blin,et al.  A survey of RDF storage approaches , 2012, ARIMA J..

[5]  Adina Crainiceanu,et al.  Rya: a scalable RDF triple store for the clouds , 2012, Cloud-I '12.

[6]  Jens Lehmann,et al.  DBpedia SPARQL Benchmark - Performance Assessment with Real Queries on Real Data , 2011, SEMWEB.

[7]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[8]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[9]  Daniel J. Abadi,et al.  Scalable Semantic Web Data Management Using Vertical Partitioning , 2007, VLDB.

[10]  Georg Lausen,et al.  PigSPARQL: mapping SPARQL to Pig Latin , 2011, SWIM '11.

[11]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[12]  Paul T. Groth,et al.  NoSQL Databases for RDF: An Empirical Evaluation , 2013, International Semantic Web Conference.

[13]  Christian Bizer,et al.  The Berlin SPARQL Benchmark , 2009, Int. J. Semantic Web Inf. Syst..

[14]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[15]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[16]  Ioana Manolescu,et al.  RDF in the clouds: a survey , 2014, The VLDB Journal.

[17]  Pierre Genevès,et al.  SPARQLGX: Efficient Distributed Evaluation of SPARQL with Apache Spark , 2016, International Semantic Web Conference.

[18]  Hairong Kuang,et al.  The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).

[19]  Gianluca Demartini,et al.  BowlognaBench - Benchmarking RDF Analytics , 2011, SIMPDA.

[20]  Georg Lausen,et al.  S2RDF: RDF Querying with SPARQL on Spark , 2015, Proc. VLDB Endow..

[21]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[22]  François Goasdoué,et al.  CliqueSquare: Flat plans for massively parallel RDF queries , 2015, 2015 IEEE 31st International Conference on Data Engineering.

[23]  Georg Lausen,et al.  SP^2Bench: A SPARQL Performance Benchmark , 2008, 2009 IEEE 25th International Conference on Data Engineering.

[24]  Ghislain Auguste Atemezing,et al.  Benchmarking Commercial RDF Stores with Publications Office Dataset , 2018, GeoLD-QuWeDa@ESWC.

[25]  Georg Lausen,et al.  SP2Bench: A SPARQL Performance Benchmark , 2008, Semantic Web Information Management.

[26]  Josep-Lluís Larriba-Pey,et al.  The linked data benchmark council: a graph and RDF industry benchmarking effort , 2014, SGMD.

[27]  Misumi Sadler 1.1. Overview , 2007 .