Querying in a Workload-Aware Triplestore Based on NoSQL Databases

RDF and SPARQL are increasingly used in a broad range of information management scenarios (e.g., governments, corporations, and startups). Scalable SPARQL querying has been the main issue for virtually all the recent RDF triplestores. This paper presents WA-RDF, a middleware that addresses workload-adaptive management of large RDF graphs. Our middleware not only employs all the most used NoSQL data models but also provides a novel RDF data partitioning approach based on a fragmentation strategy that maps RDF data into multiple NoSQL databases. This workload-aware partitioning scheme provides, in turn, efficient processing of SPARQL queries over these NoSQL databases. Our experimental evaluation shows that the solution is promising, outperforming three recent baselines.

[1]  Ronaldo dos Santos Mello,et al.  Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases , 2017, SBBD.

[2]  Vania Bogorny,et al.  MASTER: A multiple aspect view on trajectories , 2019, Trans. GIS.

[3]  Georg Lausen,et al.  S2RDF: RDF Querying with SPARQL on Spark , 2015, Proc. VLDB Endow..

[4]  Tianyu Wo,et al.  ScalaRDF: A Distributed, Elastic and Scalable In-Memory RDF Triple Store , 2016, 2016 IEEE 22nd International Conference on Parallel and Distributed Systems (ICPADS).

[5]  Ioana Manolescu,et al.  Invisible Glue: Scalable Self-Tunning Multi-Stores , 2015, CIDR.

[6]  Jeff Heflin,et al.  LUBM: A benchmark for OWL knowledge base systems , 2005, J. Web Semant..

[7]  Zongmin Ma,et al.  Storing massive Resource Description Framework (RDF) data: a survey , 2016, The Knowledge Engineering Review.

[8]  Rong Gu,et al.  Rainbow: A distributed and hierarchical RDF triple store with dynamic scalability , 2014, 2014 IEEE International Conference on Big Data (Big Data).