Entrepôts de données multidimensionnelles NoSQL

Les donnees des systemes d'analyse en ligne (OLAP, On-Line Analytical Processing) sont traditionnellement gerees par des bases de donnees relationnelles. Malheureusement, il devient difficile de gerer des megadonnees (de gros volumes de donnees, « Big Data »). Dans un tel contexte, comme alternative, les environnements « Not-Only SQL » (NoSQL) peuvent fournir un passage a l'echelle tout en gardant une certaine flexibilite pour un systeme OLAP. Nous definissons ainsi des regles pour convertir un schema en etoile, ainsi que son optimisation, le treillis d'agregats pre-calcules, en deux modeles logiques NoSQL : oriente-colonnes ou oriente-documents. En utilisant ces regles, nous implementons et analysons deux systemes decisionnels, un par modele, avec MongoDB et HBase. Nous comparons ces derniers sur les phases de chargement des donnees (generees avec le benchmark TPC-DS), de calcul d'un treillis et d'interrogation.

[1]  Eleni Stroulia,et al.  A three-dimensional data model in HBase for large time-series dataset analysis , 2012, 2012 IEEE 6th International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA).

[2]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[3]  Yang-Sae Moon,et al.  Efficient Distributed Parallel Top-Down Computation of ROLAP Data Cube Using MapReduce , 2012, DaWaK.

[4]  Chongxin Li,et al.  Transforming relational database into HBase: A case study , 2010, 2010 IEEE International Conference on Software Engineering and Service Sciences.

[5]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[6]  Alfredo Cuzzocrea,et al.  Data warehousing and OLAP over big data: current challenges and future research directions , 2013, DOLAP '13.

[7]  George Colliat,et al.  OLAP, relational, and multidimensional database systems , 1996, SGMD.

[8]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[9]  Lavanya Ramakrishnan,et al.  Performance evaluation of a MongoDB and hadoop platform for scientific data analysis , 2013, Science Cloud '13.

[10]  Omar Boussaïd,et al.  Columnar NoSQL Star Schema Benchmark , 2014, MEDI.

[11]  Konstantinos Morfonios,et al.  ROLAP implementations of the data cube , 2007, CSUR.

[12]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[13]  Esteban Zimányi,et al.  Hierarchies in a multidimensional model: From conceptual modeling to logical representation , 2006, Data Knowl. Eng..

[14]  Olivier Teste,et al.  Algebraic and Graphic Languages for OLAP Manipulations , 2008, Int. J. Data Warehous. Min..

[15]  Xiaojun Ye,et al.  A Practice of TPC-DS Multidimensional Implementation on NoSQL Database Systems , 2013, TPCTC.

[16]  Olivier Teste,et al.  A Conceptual Model for Multidimensional Analysis of Documents , 2007, ER.

[17]  Michael Stonebraker,et al.  New opportunities for New SQL , 2012, CACM.

[18]  Robert Wrembel A Survey of Managing the Evolution of Data Warehouses , 2009, Int. J. Data Warehous. Min..

[19]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[20]  Hassan Charaf,et al.  Denormalizing data into schema-free databases , 2013, 2013 IEEE 4th International Conference on Cognitive Infocommunications (CogInfoCom).