论文信息 - Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

Data cube computation is one of the most essential but expensive operations in data warehousing. Previous studies have developed two major approaches, top-down versus bottom-up. The former, represented by the multiway array cube (called the multiway) algorithm, aggregates simultaneously on multiple dimensions; however, it cannot take advantage of a priori pruning when computing iceberg cubes (cubes that contain only aggregate cells whose measure values satisfy a threshold, called the iceberg condition). The latter, represented by BUC, computes the iceberg cube bottom-up and facilitates a priori pruning. BUC explores fast sorting and partitioning techniques; however, it does not fully explore multidimensional simultaneous aggregation. In this paper, we present a new method, star-cubing, that integrates the strengths of the previous two algorithms and performs aggregations on multiple dimensions simultaneously. It utilizes a star-tree structure, extends the simultaneous aggregation methods, and enables the pruning of the group-bys that do not satisfy the iceberg condition. Our performance study shows that star-cubing is highly efficient and outperforms the previous methods

[1] Jeffrey F. Naughton,et al. On the Computation of Multidimensional Aggregates , 1996, VLDB.

[2] Jian Pei,et al. Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[3] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[4] Hongjun Lu,et al. Condensed cube: an effective approach to reducing data cube size , 2002, Proceedings 18th International Conference on Data Engineering.

[5] Zhimin Chen,et al. Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[6] Jeffrey D. Ullman,et al. Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[7] Jeffrey F. Naughton,et al. Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[8] Leonid Khachiyan,et al. Cubegrades: Generalizing Association Rules , 2002, Data Mining and Knowledge Discovery.

[9] Laks V. S. Lakshmanan,et al. Quotient Cube: How to Summarize the Semantics of a Data Cube , 2002, VLDB.

[10] Inderpal Singh Mumick,et al. Selection of Views to Materialize in a Data Warehouse , 2005, IEEE Trans. Knowl. Data Eng..

[11] Hamid Pirahesh,et al. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[12] Laks V. S. Lakshmanan,et al. QC-trees: an efficient summary structure for semantic OLAP , 2003, SIGMOD '03.

[13] Xintao Wu,et al. Using Loglinear Models to Compress Datacube , 2000, Web-Age Information Management.

[14] Mark Sullivan,et al. Quasi-cubes: exploiting approximations in multidimensional databases , 1997, SGMD.

[15] Hongyan Liu,et al. C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[16] Yixin Chen,et al. Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[17] Jiawei Han,et al. High-Dimensional OLAP: A Minimal Cubing Approach , 2004, VLDB.

[18] Paul S. Bradley,et al. Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[19] Raghu Ramakrishnan,et al. Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[20] Nimrod Megiddo,et al. Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[21] Yannis Sismanis,et al. The Complexity of Fully Materialized Coalesced Cubes , 2004, VLDB.

[22] Jeffrey Scott Vitter,et al. Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[23] Ramakrishnan Srikant,et al. Fast algorithms for mining association rules , 1998, VLDB 1998.

[24] Kenneth A. Ross,et al. Fast Computation of Sparse Datacubes , 1997, VLDB.

[25] Yannis Sismanis,et al. Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.

[26] Jeffrey D. Ullman,et al. Implementing data cubes efficiently , 1996, SIGMOD '96.

[27] Laks V. S. Lakshmanan,et al. Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[28] Elena Baralis,et al. Materialized Views Selection in a Multidimensional Database , 1997, VLDB.