Towards a Multi-Model Approach to Support User-Driven Extensibility in Data Warehouses: Agro-ecology Case Study

Information systems have evolved into complex data platforms supporting end-to-end data-intensive needs, aimed at coping with the different V’s that characterize Big Data. In particular, multi-model databases (MMDBs) have been proposed to natively support storing and querying data in different (schemaless) models, so as to better handle Variety. In this work we envision a new data warehouse architecture in which an MMDB is used to enable on-the-fly user-driven extensions of multidimensional cubes with additional data, while ensuring support to variable and complex data and keeping the impact on ETL low. After proposing the architecture with the aid of a case study on the management of emerging plant disease, we discuss the main associated open issues.

[1]  S. Rizzi,et al.  Logical design of multi-model data warehouses , 2022, Knowledge and Information Systems.

[2]  I. Holubová,et al.  A unified representation and transformation of multi-model data using category theory , 2022, J. Big Data.

[3]  F. Fabre,et al.  Field and landscape risk factors impacting Flavescence doree infection : Insights from spatial Bayesian modelling in the Bordeaux vineyards. , 2022, Phytopathology.

[4]  Diego Sevilla Ruiz,et al.  A Unified Metamodel for NoSQL and Relational Databases , 2021, Inf. Syst..

[5]  Stefano Rizzi,et al.  Data variety, come as you are in multi-model data warehouses , 2021, Inf. Syst..

[6]  Angela Bonifati,et al.  Schema Inference for Property Graphs , 2020, EDBT.

[7]  S. Vos,et al.  Pest survey card on flavescence dorée phytoplasma and its vector Scaphoideus titanus , 2020 .

[8]  Matteo Golfarelli,et al.  Approximate OLAP of document-oriented databases: A variety-aware approach , 2019, Inf. Syst..

[9]  Renée J. Miller,et al.  Data Lake Management: Challenges and Opportunities , 2019, Proc. VLDB Endow..

[10]  Rachid Chalal,et al.  EXODuS: Exploratory OLAP over Document Stores , 2017, Inf. Syst..

[11]  Dario Colazzo,et al.  Schema Inference for Massive JSON Datasets , 2017, EDBT.

[12]  Andy Neely,et al.  Capturing value from big data – a taxonomy of data-driven business models used by start-up firms , 2016 .

[13]  Daniel J. Abadi,et al.  Automatic Generation of Normalized Relational Schemas from Nested Key-Value Data , 2016, SIGMOD Conference.

[14]  Sandro Bimonte,et al.  When Spatial Analysis Meets OLAP: Multidimensional Model and Operators , 2010, Int. J. Data Warehous. Min..

[15]  Philip S. Yu,et al.  Graph OLAP: a multi-dimensional framework for graph data analysis , 2009, Knowledge and Information Systems.

[16]  Matteo Golfarelli,et al.  A methodological framework for data warehouse design , 1998, DOLAP '98.

[17]  西内 光 Agroecology , 1953, Ecological Studies.

[18]  Wolfram Wingerath,et al.  Polyglot Data Management: State of the Art & Open Challenges , 2022, Proc. VLDB Endow..

[19]  Reynold Xin,et al.  Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics , 2021, CIDR.

[20]  Dieter Gawlick,et al.  Management of Flexible Schema Data in RDBMSs - Opportunities and Limitations for NoSQL - , 2015, CIDR.

[21]  Lu,et al.  Multi-model Databases , 2019, ACM Comput. Surv..