Divide-and-approximate: a novel constraint push strategy for iceberg cube mining

The iceberg cube mining computes all cells v, corresponding to GROUP BY partitions, that satisfy a given constraint on aggregated behaviors of the tuples in a GROUP BY partition. The number of cells often is so large that the result cannot be realistically searched without pushing the constraint into the search. Previous works have pushed antimonotone and monotone constraints. However, many useful constraints are neither antimonotone nor monotone. We consider a general class of aggregate constraints of the form f(v)/spl theta//spl sigma/, where f is an arithmetic function of SQL-like aggregates and /spl theta/ is one of <, /spl les/, /spl ges/ >. We propose a novel pushing technique, called divide-and-approximate, to push such constraints. The idea is to recursively divide the search space and approximate the given constraint using antimonotone or monotone constraints in subspaces. This technique applies to a class called separable constraints, which properly contains all constraints built by an arithmetic function f of all SQL aggregates.

[1]  J BayardoRoberto Efficiently mining long patterns from databases , 1998 .

[2]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[3]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[4]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[5]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[6]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[7]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  D UllmanJeffrey,et al.  Implementing data cubes efficiently , 1996 .

[10]  Dimitrios Gunopulos,et al.  Constraint-Based Rule Mining in Large, Dense Databases , 2004, Data Mining and Knowledge Discovery.

[11]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[12]  Ke Wang,et al.  Mining confident rules without support requirement , 2001, CIKM '01.

[13]  Jian Pei,et al.  Efficient computation of Iceberg cubes with complex measures , 2001, SIGMOD '01.

[14]  Ramakrishnan Srikant,et al.  Mining Association Rules with Item Constraints , 1997, KDD.

[15]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[16]  Laks V. S. Lakshmanan,et al.  Pushing Convertible Constraints in Frequent Itemset Mining , 2004, Data Mining and Knowledge Discovery.

[17]  Jeffrey F. Naughton,et al.  An array-based algorithm for simultaneous multidimensional aggregates , 1997, SIGMOD '97.

[18]  Ke Wang,et al.  Pushing Support Constraints Into Association Rules Mining , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Ke Wang,et al.  Mining Frequent Itemsets Using Support Constraints , 2000, VLDB.

[20]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[21]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[22]  Laks V. S. Lakshmanan,et al.  Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.