Data variety, come as you are in multi-model data warehouses

Abstract Multi-model DBMSs (MMDBMSs) have been recently introduced to store and seamlessly query heterogeneous data (structured, semi-structured, graph-based, etc.) in their native form, aimed at effectively preserving their variety. Unfortunately, when it comes to analyzing these data, traditional data warehouses (DWs) and OLAP systems fall short because they rely on relational DBMSs for storage and querying, thus constraining data variety into the rigidity of a structured, fixed schema. In this paper, we investigate the performances of an MMDBMS when used to store multidimensional data for OLAP analyses. A multi-model DW would store each of its elements according to its native model; among the benefits we envision for this solution, that of bridging the architectural gap between data lakes and DWs, that of reducing the cost for ETL, and that of ensuring better flexibility, extensibility, and evolvability thanks to the combined use of structured and schemaless data. To support our investigation we define a multidimensional schema for the UniBench benchmark dataset and an ad-hoc OLAP workload for it. Then we propose and compare three logical solutions implemented on the PostgreSQL multi-model DBMS: one that extends a star schema with JSON, XML, graph-based, and key–value data; one based on a classical (fully relational) star schema; and one where all data are kept in their native form (no relational data are introduced). As expected, the full-relational implementation generally performs better than the multi-model one, but this is balanced by the benefits of MMDBMSs in dealing with variety. Finally, we give our perspective view of the research on this topic.

[1]  Max Chevalier,et al.  Document-oriented Models for Data Warehouses - NoSQL Document-oriented for Data Warehouses , 2016, ICEIS.

[2]  Shrainik Jain,et al.  SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment , 2016, SIGMOD Conference.

[3]  Gottfried Vossen,et al.  Towards Self-Service Business Intelligence , 2013 .

[4]  Rachid Chalal,et al.  EXODuS: Exploratory OLAP over Document Stores , 2017, Inf. Syst..

[5]  Omar Boussaïd,et al.  Logical Schema for Data Warehouse on Column-Oriented NoSQL Databases , 2017, DEXA.

[6]  Matteo Golfarelli,et al.  Approximate OLAP of document-oriented databases: A variety-aware approach , 2019, Inf. Syst..

[7]  Sandro Bimonte,et al.  A Model&DBMS Independent Benchmark for Data Warehouses , 2017, EDA.

[8]  Khaled Dehdouh Building OLAP Cubes from Columnar NoSQL Data Warehouses , 2016, MEDI.

[9]  George Papastefanatos,et al.  Metrics for the Prediction of Evolution Impact in ETL Ecosystems: A Case Study , 2012, Journal on Data Semantics.

[10]  Enrico Gallinucci,et al.  Answering GPSJ Queries in a Polystore: A Dataspace-Based Approach , 2019, ER.

[11]  Franck Ravat,et al.  Data Lakes: Trends and Perspectives , 2019, DEXA.

[12]  Irena Holubová,et al.  Multi-model Databases , 2019, ACM Comput. Surv..

[13]  Essaid Sabir,et al.  Benchmarking Big Data OLAP NoSQL Databases , 2018, UNet.

[14]  Sabrina Marczak,et al.  A Mapping Study about Data Lakes: An Improved Definition and Possible Architectures , 2019, SEKE.

[15]  Alberto Abelló,et al.  A Survey of Multidimensional Modeling Methodologies , 2009, Int. J. Data Warehous. Min..

[16]  Carlos Ordonez,et al.  Value-driven Approach for Designing Extended Data Warehouses , 2019, DOLAP.

[17]  Stefan Plantikow,et al.  Cypher: An Evolving Query Language for Property Graphs , 2018, SIGMOD Conference.

[18]  George Papastefanatos,et al.  Design Metrics for Data Warehouse Evolution , 2008, ER.

[19]  Max Chevalier,et al.  Document-Oriented Data Warehouses: Complex Hierarchies and Summarizability , 2016, UNet.

[20]  Max Chevalier,et al.  Document-oriented data warehouses: Models and extended cuboids, extended cuboids in oriented document , 2016, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS).

[21]  Matteo Golfarelli,et al.  Data Warehouse Design: Modern Principles and Methodologies , 2009 .

[22]  Meike Klettke,et al.  Managing Schema Evolution in NoSQL Data Stores , 2013, DBPL.

[23]  Xuedong Chen,et al.  The Star Schema Benchmark and Augmented Fact Table Indexing , 2009, TPCTC.

[24]  Stefano Rizzi,et al.  To Each His Own: Accommodating Data Variety by a Multimodel Star Schema , 2020, DOLAP.

[25]  Paolo Atzeni,et al.  Uniform access to NoSQL systems , 2014, Inf. Syst..

[26]  Rachid Chalal,et al.  An overview of XML warehouse design approaches and techniques , 2013, Int. J. Inf. Coding Theory.

[27]  Sandro Bimonte,et al.  Design and Implementation of Falling Star - A Non-Redudant Spatio-Multidimensional Logical Model for Document Stores , 2017, ICEIS.

[28]  Jiaheng Lu,et al.  UniBench: A Benchmark for Multi-model Database Management Systems , 2018, TPCTC.

[29]  Stefanie Scherzinger,et al.  An Empirical Study on the Design and Evolution of NoSQL Database Schemas , 2020, ER.

[30]  Max Chevalier,et al.  Implementation of Multidimensional Databases in Column-Oriented NoSQL Systems , 2015, ADBIS.

[31]  Andreas Heuer,et al.  Enabling flexible integration of healthcare information using the entity-attribute-value storage model , 2013, Health Information Science and Systems.

[32]  Anne Laurent,et al.  NoSQL Graph-based OLAP Analysis , 2014, KDIR.

[33]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling , 1996 .

[34]  Masatoshi Yoshikawa,et al.  Storage and Retrieval of XML Documents Using Object-Relational Databases , 1999, DEXA.

[35]  Michael Stonebraker,et al.  The BigDAWG polystore system and architecture , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[36]  Max Chevalier,et al.  Implementation of Multidimensional Databases with Document-Oriented NoSQL , 2015, DaWaK.

[37]  Faïez Gargouri,et al.  Transformation of Data Warehouse Schema to NoSQL Graph Data Base , 2018, ISDA.

[38]  Omar Boussaïd,et al.  Efficient Compression and Storage of XML OLAP Cubes , 2015, Int. J. Data Warehous. Min..