Condensed cube: an effective approach to reducing data cube size

Pre-computed data cube facilitates OLAP (on-line analytical processing). It is well-known that data cube computation is an expensive operation. While most algorithms have been devoted to optimizing memory management and reducing computation costs, less work has addressed a fundamental issue: the size of a data cube is huge when a large base relation with a large number of attributes is involved. In this paper, we propose a new concept, called a condensed data cube. The condensed cube is of much smaller size than a complete non-condensed cube. More importantly, it is a fully pre-computed cube without compression, and, hence, it requires neither decompression nor further aggregation when answering queries. Several algorithms for computing a condensed cube are proposed. Results of experiments on the effectiveness of condensed data cube are presented, using both synthetic and real-world data. The results indicate that the proposed condensed cube can reduce both the cube size and therefore its computation time.

[1]  Sin Yeung Lee,et al.  Hierarchical Compact Cube for Range-Max Queries , 2000, VLDB.

[2]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD '00.

[3]  Terence R. Smith,et al.  Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  Doron Rotem,et al.  Bit Transposed Files , 1985, VLDB.

[5]  Mark Sullivan,et al.  Quasi-cubes: exploiting approximations in multidimensional databases , 1997, SGMD.

[6]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[7]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[8]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[9]  Kenneth A. Ross,et al.  Serving datacube tuples from main memory , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[10]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[11]  Jaideep Srivastava,et al.  Aggregation Algorithms for Very Large Compressed Data Warehouses , 1999, VLDB.

[12]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[13]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[14]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[15]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[16]  Viswanath Poosala,et al.  Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.

[17]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[18]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[19]  Balakrishna R. Iyer,et al.  Data Compression Support in Databases , 1994, VLDB.