论文信息 - Condensed cube: an effective approach to reducing data cube size

Condensed cube: an effective approach to reducing data cube size

Pre-computed data cube facilitates OLAP (on-line analytical processing). It is well-known that data cube computation is an expensive operation. While most algorithms have been devoted to optimizing memory management and reducing computation costs, less work has addressed a fundamental issue: the size of a data cube is huge when a large base relation with a large number of attributes is involved. In this paper, we propose a new concept, called a condensed data cube. The condensed cube is of much smaller size than a complete non-condensed cube. More importantly, it is a fully pre-computed cube without compression, and, hence, it requires neither decompression nor further aggregation when answering queries. Several algorithms for computing a condensed cube are proposed. Results of experiments on the effectiveness of condensed data cube are presented, using both synthetic and real-world data. The results indicate that the proposed condensed cube can reduce both the cube size and therefore its computation time.

[1] Sin Yeung Lee,et al. Hierarchical Compact Cube for Range-Max Queries , 2000, VLDB.

[2] Dimitrios Gunopulos,et al. Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD '00.

[3] Terence R. Smith,et al. Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4] Doron Rotem,et al. Bit Transposed Files , 1985, VLDB.

[5] Mark Sullivan,et al. Quasi-cubes: exploiting approximations in multidimensional databases , 1997, SGMD.

[6] Raghu Ramakrishnan,et al. Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[7] Jeffrey F. Naughton,et al. On the Computation of Multidimensional Aggregates , 1996, VLDB.

[8] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[9] Kenneth A. Ross,et al. Serving datacube tuples from main memory , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[10] Viswanath Poosala,et al. Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[11] Jaideep Srivastava,et al. Aggregation Algorithms for Very Large Compressed Data Warehouses , 1999, VLDB.

[12] Paul S. Bradley,et al. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[13] Jeffrey F. Naughton,et al. An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[14] Kenneth A. Ross,et al. Fast Computation of Sparse Datacubes , 1997, VLDB.

[15] Yossi Matias,et al. DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[16] Viswanath Poosala,et al. Congressional samples for approximate answering of group-by queries , 2000, SIGMOD '00.

[17] Jeffrey Scott Vitter,et al. Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[18] Forouzan Golshani,et al. Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[19] Balakrishna R. Iyer,et al. Data Compression Support in Databases , 1994, VLDB.