A Middleware for Polyglot Persistence of RDF Data into NoSQL Databases

Software engineers can consider today a multitude of storage solutions and data formats to achieve better performance, lower cost, or even explore the power expression of a data model to develop an application. We call it polyglot access. Nevertheless, the cost of developing polyglot software increases due, for instance, to the complexity of managing multiple connections to databases and the need for training people to use different tools, models and query languages. This paper presents a scalable middleware, called WA-RDF, that provides a unique gateway to multiple NoSQL databases. Different from other similar ideas, WA-RDF uses the well-known abstractions of Semantic Web to store and query RDF data into key/value, document and graph databases. Moreover, WA-RDF includes workload-awareness, fragmentation and partitioning components to meet the NoSQL high level of scalability. An experimental evaluation shows that the approach is promising. It scaled linearly to the dataset size and query frequency growth, and outperformed a multimodel database in the tested use cases.

[1]  Narendra Shekokar,et al.  A Polyglot Persistence approach for E-Commerce business model , 2016, 2016 International Conference on Information Science (ICIS).

[2]  Georg Lausen,et al.  S2RDF: RDF Querying with SPARQL on Spark , 2015, Proc. VLDB Endow..

[3]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[4]  Zongmin Ma,et al.  A Review of RDF Storage in NoSQL Databases , 2016 .

[5]  Patrick Valduriez,et al.  Query processing in multistore systems: an overview , 2016, Int. J. Cloud Comput..

[6]  Rinkle Rani,et al.  Managing Data in Healthcare Information Systems: Many Models, One Solution , 2015, Computer.

[7]  Cyril Ray,et al.  Semantic management of moving objects: A vision towards smart mobility , 2015, Expert Syst. Appl..

[8]  Ioana Manolescu,et al.  Invisible Glue: Scalable Self-Tunning Multi-Stores , 2015, CIDR.

[9]  Farhan Ullah,et al.  Semantic interoperability for big-data in heterogeneous IoT infrastructure for healthcare , 2017 .

[10]  Lena Wiese Polyglot Database Architectures = Polyglot Challenges , 2015, LWA.

[11]  Ronaldo dos Santos Mello,et al.  Workload-Aware RDF Partitioning and SPARQL Query Caching for Massive RDF Graphs stored in NoSQL Databases , 2017, SBBD.

[12]  Heiko Schuldt,et al.  Polypheny-DB: Towards a Distributed and Self-Adaptive Polystore , 2018, 2018 IEEE International Conference on Big Data (Big Data).

[13]  Heiko Schuldt,et al.  Icarus: Towards a multistore database system , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[14]  Martin Fowler,et al.  NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence , 2012 .

[15]  Vania Bogorny,et al.  MASTER: A multiple aspect view on trajectories , 2019, Trans. GIS.

[16]  M. Tamer Özsu,et al.  Building self-clustering RDF databases using Tunable-LSH , 2018, The VLDB Journal.

[17]  M. Tamer Özsu,et al.  Diversified Stress Testing of RDF Data Management Systems , 2014, SEMWEB.

[18]  M. Tamer Özsu,et al.  Workload Matters: Why RDF Databases Need a New Design , 2014, Proc. VLDB Endow..

[19]  John Sharp,et al.  Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence , 2013 .