Cubegrades: Generalizing Association Rules

Cubegrades are a generalization of association rules which represent how a set of measures (aggregates) is affected by modifying a cube through specialization (rolldown), generalization (rollup) and mutation (which is a change in one of the cube's dimensions). Cubegrades are significantly more expressive than association rules in capturing trends and patterns in data because they can use other standard aggregate measures, in addition to COUNT. Cubegrades are atoms which can support sophisticated “what if” analysis tasks dealing with behavior of arbitrary aggregates over different database segments. As such, cubegrades can be useful in marketing, sales analysis, and other typical data mining applications in business.In this paper we introduce the concept of cubegrades. We define them and give examples of their usage. We then describe in detail an important task for computing cubegrades: generation of significant cubes whichis analogous to generating frequent sets. A novel Grid Based Pruning (GBP) method is employed for this purpose. We experimentally demonstrate the practicality of the method. We conclude with a number of open questions and possible extensions of the work.

[1]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[2]  Sunita Sarawagi,et al.  Explaining Differences in Multidimensional Aggregates , 1999, VLDB.

[3]  H. P. Williams THEORY OF LINEAR AND INTEGER PROGRAMMING (Wiley-Interscience Series in Discrete Mathematics and Optimization) , 1989 .

[4]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[5]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[6]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[7]  Gert Vegter,et al.  In handbook of discrete and computational geometry , 1997 .

[8]  Jeffrey F. Naughton,et al.  Materialized View Selection for Multidimensional Datasets , 1998, VLDB.

[9]  Alexander Schrijver,et al.  Theory of linear and integer programming , 1986, Wiley-Interscience series in discrete mathematics and optimization.

[10]  Heikki Mannila,et al.  Fast Discovery of Association Rules , 1996, Advances in Knowledge Discovery and Data Mining.

[11]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[12]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[13]  HanJiawei,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998 .

[14]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[15]  Laks V. S. Lakshmanan,et al.  Optimization of constrained frequent set queries with 2-variable constraints , 1999, SIGMOD '99.

[16]  Kenneth A. Ross,et al.  Foundations of Aggregation Constraints , 1994, PPCP.

[17]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[18]  Laks V. S. Lakshmanan,et al.  Exploratory mining and pruning optimizations of constrained associations rules , 1998, SIGMOD '98.

[19]  Elena Baralis,et al.  Materialized Views Selection in a Multidimensional Database , 1997, VLDB.

[20]  Saugata Basu,et al.  An improved algorithm for quantifier elimination over real closed fields , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[21]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[22]  AgrawalRakesh,et al.  Mining association rules between sets of items in large databases , 1993 .

[23]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[24]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[25]  Joseph O'Rourke,et al.  Handbook of Discrete and Computational Geometry, Second Edition , 1997 .

[26]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[27]  Tomasz Imielinski,et al.  DMajor—Application Programming Interface for Database Mining , 1999, Data Mining and Knowledge Discovery.

[28]  Joos Heintz,et al.  On the Theoretical and Practical Complexity of the Existential Theory of Reals , 1993, Comput. J..