An Empirical Evaluation of RDF Graph Partitioning Techniques

With the significant growth of RDF data sources in both numbers and volume comes the need to improve the scalability of RDF storage and querying solutions. Current implementations employ various RDF graph partitioning techniques. However, choosing the most suitable partitioning for a given RDF graph and application is not a trivial task. To the best of our knowledge, no detailed empirical evaluation exists to evaluate the performance of these techniques. In this work, we present an empirical evaluation of RDF graph partitioning techniques applied to real-world RDF data sets and benchmark queries. We evaluate the selected RDF graph partitioning techniques in terms of their partitioning time, partitioning imbalance (in sizes), and query run time performances achieved, based on real-world data sets and queries selected using the FEASIBLE benchmark generation framework.

[1]  N. Shadbolt,et al.  4store: The Design and Implementation of a Clustered RDF Store , 2009 .

[2]  Ion Stoica,et al.  ZipG: A Memory-efficient Graph Store for Interactive Queries , 2017, SIGMOD Conference.

[3]  Steffen Staab,et al.  Impact analysis of data placement strategies on query efforts in distributed RDF stores , 2018, J. Web Semant..

[4]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[5]  Peter Sanders,et al.  Recent Advances in Graph Partitioning , 2013, Algorithm Engineering.

[6]  Gerhard Weikum,et al.  The RDF-3X engine for scalable management of RDF data , 2010, The VLDB Journal.

[7]  Katja Hose,et al.  FedX: Optimization Techniques for Federated Query Processing on Linked Data , 2011, SEMWEB.

[8]  Georg Lausen,et al.  Sempala: Interactive SPARQL Query Processing on Hadoop , 2014, SEMWEB.

[9]  Jürgen Umbrich,et al.  YARS2: A Federated Repository for Querying Graph Structured Data from the Web , 2007, ISWC/ASWC.

[10]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[11]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[12]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[13]  Hugh C. Davis,et al.  LHD: Optimising Linked Data Query Processing Using Parallelisation , 2013, LDOW.

[14]  Antonis Troumpoukis,et al.  SemaGrow: optimizing federated SPARQL queries , 2015, SEMANTiCS.

[15]  Li Ma,et al.  Efficient Indices Using Graph Partitioning in RDF Triple Stores , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Sherif Sakr,et al.  DREAM: Distributed RDF Engine with Adaptive Query Planner and Minimal Communication , 2015, Proc. VLDB Endow..

[17]  Muhammad Saleem,et al.  A fine-grained evaluation of SPARQL endpoint federation systems , 2016, Semantic Web.

[18]  Katja Hose,et al.  Partout: a distributed engine for efficient RDF processing , 2012, WWW.

[19]  Maribel Acosta,et al.  ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints , 2011, SEMWEB.

[20]  Orri Erling,et al.  Towards Web Scale RDF , 2008 .

[21]  Muhammad Saleem,et al.  FEASIBLE: A Feature-Based SPARQL Benchmark Generation Framework , 2015, SEMWEB.

[22]  David Wood,et al.  RDF Graph Partitions: A Brief Survey , 2015, BDAS.

[23]  Herodotos Herodotou,et al.  Query optimization techniques for partitioned tables , 2011, SIGMOD '11.

[24]  Andy Seaborne,et al.  Clustered TDB: A Clustered Triple Store for Jena , 2008 .

[25]  Steffen Staab,et al.  Koral: A Glass Box Profiling System for Individual Components of Distributed RDF Stores , 2017, BLINK/NLIWoD3@ISWC.

[26]  Georg Lausen,et al.  S2RDF: RDF Querying with SPARQL on Spark , 2015, Proc. VLDB Endow..

[27]  Steffen Staab,et al.  SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions , 2011, COLD.