CURE for cubes: cubing using a ROLAP engine

Data cube construction has been the focus of much research due to its importance in improving efficiency of OLAP. A significant fraction of this work has been on ROLAP techniques, which are based on relational technology. Existing ROLAP cubing solutions mainly focus on "flat" datasets, which do not include hierarchies in their dimensions. Nevertheless, the nature of hierarchies introduces several complications into cube construction, making existing techniques essentially inapplicable in a significant number of real-world applications. In particular, hierarchies raise three main challenges: (a) The number of nodes in a cube lattice increases dramatically and its shape is more involved. These require new forms of lattice traversal for efficient execution. (b) The number of unique values in the higher levels of a dimension hierarchy may be very small; hence, partitioning data into fragments that fit in memory and include all entries of a particular value may often be impossible. This requires new partitioning schemes. (c) The number of tuples that need to be materialized in the final cube increases dramatically. This requires new storage schemes that remove all forms of redundancy for efficient space utilization. In this paper, we propose CURE, a novel ROLAP cubing method that addresses these issues and constructs complete data cubes over very large datasets with arbitrary hierarchies. CURE contributes a novel lattice traversal scheme, an optimized partitioning method, and a suite of relational storage schemes for all forms of redundancy. We demonstrate the effectiveness of CURE through experiments on both real-world and synthetic datasets. Among the experimental results, we distinguish those that have made CURE the first ROLAP technique to complete the construction of the cube of the highest-density dataset in the APB-1 benchmark (12 GB). CURE was in fact quite efficient on this, showing great promise with respect to the potential of the technique overall.

[1]  Hongjun Lu,et al.  Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[2]  Douglas R. McGregor,et al.  Elimination of Redundant Views in Multidimensional Aggregates , 2000, DaWaK.

[3]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[4]  Laks V. S. Lakshmanan,et al.  Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[5]  Anthony K. H. Tung,et al.  Incremental maintenance of quotient cube for median , 2004, KDD.

[6]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[7]  Yannis Sismanis,et al.  Hierarchical dwarfs for the rollup cube , 2003, DOLAP '03.

[8]  Jiawei Han,et al.  High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[9]  Sunita Sarawagi,et al.  On computing the data cube , 1996 .

[10]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[11]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[12]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[13]  Laks V. S. Lakshmanan,et al.  QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[14]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[15]  Yannis Sismanis,et al.  The Complexity of Fully Materialized Coalesced Cubes , 2004, VLDB.

[16]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[17]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[18]  Jiawei Han,et al.  MM-Cubing: computing Iceberg cubes by factorizing the lattice space , 2004, Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004..

[19]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[20]  Zhimin Chen,et al.  Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[21]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[22]  Timos K. Sellis,et al.  CUBE File: A File Structure for Hierarchically Clustered OLAP Cubes , 2004, EDBT.

[23]  Laks V. S. Lakshmanan,et al.  What can Hierarchies do for Data Warehouses? , 1999, VLDB.

[24]  Divyakant Agrawal,et al.  Range cube: efficient cube computation by exploiting data correlation , 2004, Proceedings. 20th International Conference on Data Engineering.