Benchmark for OLAP on NoSQL technologies comparing NoSQL multidimensional data warehousing solutions

The plethora of data warehouse solutions has created a need comparing these solutions using experimental benchmarks. Existing benchmarks rely mostly on the relational data model and do not take into account other models. In this paper, we propose an extension to a popular benchmark (the Star Schema Benchmark or SSB) that considers non-relational NoSQL models. To avoid data post-processing required for using this data with NoSQL systems, the data is generated in different formats. To exploit at best horizontal scaling, data can be produced in a distributed file system, hence removing disk or partition sizes as limit for the generated dataset. Experimental work proves improved performance of our new benchmark.

[1]  Yon Dohn Chung,et al.  Parallel data processing with MapReduce: a survey , 2012, SGMD.

[2]  Anand Sivasubramaniam,et al.  Synthesizing Representative I/O Workloads for TPC-H , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[3]  Omar Boussaïd,et al.  Columnar NoSQL Star Schema Benchmark , 2014, MEDI.

[4]  Max Chevalier,et al.  Implementing Multidimensional Data Warehouses into NoSQL , 2015, ICEIS.

[5]  Olivier Teste,et al.  A Multiversion-Based Multidimensional Model , 2006, DaWaK.

[6]  Lavanya Ramakrishnan,et al.  Performance evaluation of a MongoDB and hadoop platform for scientific data analysis , 2013, Science Cloud '13.

[7]  Ralph Kimball,et al.  The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling , 2013 .

[8]  R. Moussa TPC-H benchmarking of Pig Latin on a Hadoop cluster , 2012, 2012 International Conference on Communications and Information Technology (ICCIT).

[9]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[10]  Olivier Teste,et al.  Algebraic and Graphic Languages for OLAP Manipulations , 2008, Int. J. Data Warehous. Min..

[11]  Xuedong Chen,et al.  The Star Schema Benchmark and Augmented Fact Table Indexing , 2009, TPCTC.

[12]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[13]  Matteo Golfarelli,et al.  The Dimensional Fact Model: A Conceptual Model for Data Warehouses , 1998, Int. J. Cooperative Inf. Syst..

[14]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Eleni Stroulia,et al.  A three-dimensional data model in HBase for large time-series dataset analysis , 2012, 2012 IEEE 6th International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems (MESOCA).

[16]  Olivier Teste,et al.  Graphical Querying of Multidimensional Databases , 2007, ADBIS.

[17]  Adam Jacobs,et al.  The pathologies of big data , 2009, Commun. ACM.

[18]  Robert Wrembel A Survey of Managing the Evolution of Data Warehouses , 2009, Int. J. Data Warehous. Min..

[19]  Willy Zwaenepoel,et al.  HadoopToSQL: a mapReduce query optimizer , 2010, EuroSys '10.

[20]  Fusheng Wang,et al.  YSmart: Yet Another SQL-to-MapReduce Translator , 2011, 2011 31st International Conference on Distributed Computing Systems.

[21]  Michael Stonebraker,et al.  New opportunities for New SQL , 2012, CACM.