Koral: A Glass Box Profiling System for Individual Components of Distributed RDF Stores

In the last years, scalable RDF stores in the cloud have been developed increasing the complexity of RDF stores running on a single computer. In order to gain a deeper understanding how, e.g., the data placement or the distributed query execution strategies affect the performance, we have developed the modular glass box profiling system Koral. With its help, it is possible to test the behaviour of already existing or newly created strategies tackling the challenges caused by the distribution in a realistic distributed RDF store. Thereby, the design goal of Koral is that only the evaluated component needs to be exchanged and the adaptation of other components is aimed to be minimal. The wide variety of measurements allow for an in-depth investigation of the performance. With Koral we analyse the impact of the three most commonly used data placement strategies and found out that balancing query workload reduces the query execution time more than reducing the data transfer.

[1]  Alberto O. Mendelzon,et al.  Foundations of semantic web databases , 2004, PODS.

[2]  Steffen Staab,et al.  SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data , 2012, SEMWEB.

[3]  Olivier Curé,et al.  On the Evaluation of RDF Distribution Algorithms Implemented over Apache Spark , 2015, SSWS@ISWC.

[4]  Marcelo Arenas,et al.  Semantics and complexity of SPARQL , 2006, TODS.

[5]  Yavor Nenov,et al.  Distributed RDF Query Answering with Dynamic Data Exchange , 2016, International Semantic Web Conference.

[6]  Daniel J. Abadi,et al.  Scalable SPARQL querying of large RDF graphs , 2011, Proc. VLDB Endow..

[7]  Marcos Dias de Assunção,et al.  Apache Spark , 2019, Encyclopedia of Big Data Technologies.

[8]  Ling Liu,et al.  Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning , 2013, Proc. VLDB Endow..

[9]  Martin Theobald,et al.  TriAD: a distributed shared-nothing RDF engine based on asynchronous message passing , 2014, SIGMOD Conference.

[10]  Ling Liu,et al.  Efficient data partitioning model for heterogeneous graphs in the cloud , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[11]  Steffen Staab,et al.  Impact analysis of data placement strategies on query efforts in distributed RDF stores , 2018, J. Web Semant..

[12]  Min Wang,et al.  EAGRE: Towards scalable I/O efficient SPARQL query evaluation on the cloud , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).

[13]  Qi Zhang,et al.  Efficient and Customizable Data Partitioning Framework for Distributed Big RDF Data Processing in the Cloud , 2013, 2013 IEEE Sixth International Conference on Cloud Computing.

[14]  David F. Wood,et al.  Kowari: A Platform for Semantic Web Storage and Analysis , 2005, WWW 2005.

[15]  Haixun Wang,et al.  A Distributed Graph Engine for Web Scale RDF Data , 2013, Proc. VLDB Endow..

[16]  Marcelo Arenas,et al.  Federation and Navigation in SPARQL 1.1 , 2012, Reasoning Web.

[17]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[18]  Hai Jin,et al.  SemStore: A Semantic-Preserving Distributed RDF Triple Store , 2014, CIKM.

[19]  Boris Motik,et al.  Querying Distributed RDF Graphs: The Effects of Partitioning , 2014, SSWS@ISWC.

[20]  Philippe Cudré-Mauroux,et al.  DiploCloud: Efficient and Scalable Management of RDF Data in the Cloud , 2016, IEEE Transactions on Knowledge and Data Engineering.

[21]  Andreas Harth,et al.  Optimized index structures for querying RDF from the Web , 2005, Third Latin American Web Congress (LA-WEB'2005).

[22]  Katja Hose,et al.  WARP: Workload-aware replication and partitioning for RDF , 2013, 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW).

[23]  Alexandru Iosup,et al.  Granula: Toward Fine-grained Performance Analysis of Large-scale Graph Processing Platforms , 2017, GRADES@SIGMOD/PODS.

[24]  Steffen Staab,et al.  On data placement strategies in distributed RDF stores , 2017, SBD@SIGMOD.

[25]  Manfred Hauswirth,et al.  DAW: Duplicate-AWare Federated Query Processing over the Web of Data , 2013, SEMWEB.

[26]  Muhammad Saleem,et al.  HiBISCuS: Hypergraph-Based Source Selection for SPARQL Endpoint Federation , 2014, ESWC.

[27]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..