Efficient computation of Iceberg cubes with complex measures

It is often too expensive to compute and materialize a complete high-dimensional data cube. Computing an iceberg cube, which contains only aggregates above certain thresholds, is an effective way to derive nontrivial multi-dimensional aggregations for OLAP and data mining. In this paper, we study efficient methods for computing iceberg cubes with some popularly used complex measures, such as average, and develop a methodology that adopts a weaker but anti-monotonic condition for testing and pruning search space. In particular, for efficient computation of iceberg cubes with the average measure, we propose a top-k average pruning method and extend two previously studied methods, Apriori and BUC, to Top-k Apriori and Top-k BUC. To further improve the performance, an interesting hypertree structure, called H-tree, is designed and a new iceberg cubing method, called Top-k H-Cubing, is developed. Our performance study shows that Top-k BUC and Top-k H-Cubing are two promising candidates for scalable computation, and Top-k H-Cubing has better performance in most cases.

[1]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[2]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[3]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[4]  H. Garcia-Molina,et al.  Computing Iceberg Queries E ciently , 1998 .

[5]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[6]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[7]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[8]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[9]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[10]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[11]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[12]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[15]  Jian Pei,et al.  Can we push more constraints into frequent pattern mining? , 2000, KDD '00.

[16]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[17]  Laks V. S. Lakshmanan,et al.  Efficient mining of constrained correlated sets , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[18]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.