An efficient algorithm for finding dense regions for mining quantitative association rules

Many algorithms have been proposed for mining boolean association rules. However, very little work has been done in mining quantitative association rules. Although we can transform quantitative attributes into boolean attributes, this approach is not effective and is difficult to scale up for high-dimensional cases and also may result in many imprecise association rules. Newly designed algorithms for quantitative association rules still are persecuted by the problems of nonscalability and noise. In this paper, an efficient algorithm, DRMiner, is proposed. By using the notion of ''density'' to capture the characteristics of quantitative attributes and an efficient procedure to locate the ''dense regions'', DRMiner not only can solve the problems of previous approaches, but also can scale up well for high-dimensional cases. Evaluations on DRMiner have been performed using synthetic databases. The results show that DRMiner is effective and can scale up quite linearly with the increasing number of attributes.

[1]  Yehuda Lindell,et al.  A Statistical Theory for Quantitative Association Rules , 1999, KDD '99.

[2]  Howard J. Hamilton,et al.  DBRS: A Density-Based Spatial Clustering Method with Random Sampling , 2003, PAKDD.

[3]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[4]  Sunita Sarawagi Indexing OLAP Data , 1997, IEEE Data Eng. Bull..

[5]  Qiang Ding,et al.  Efficient Hierarchical Clustering of Large Data Sets Using P-trees , 2002, CAINE.

[6]  Jiawei Han,et al.  Efficient and Effective Clustering Methods for Spatial Data Mining , 1994, VLDB.

[7]  David Wai-Lok Cheung,et al.  Effect of Data Skewness in Parallel Mining of Association Rules , 1998, PAKDD.

[8]  Tomasz Imielinski,et al.  An Interval Classifier for Database Mining Applications , 1992, VLDB.

[9]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[10]  Hans-Peter Kriegel,et al.  OPTICS: ordering points to identify the clustering structure , 1999, SIGMOD '99.

[11]  David Wai-Lok Cheung,et al.  Towards the building of a dense-region-based OLAP system , 2001, Data Knowl. Eng..

[12]  Yasuhiko Morimoto,et al.  Data Mining with optimized two-dimensional association rules , 2001, TODS.

[13]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[14]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[15]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[16]  Chun Zhang,et al.  Storing and querying ordered XML using a relational database system , 2002, SIGMOD '02.

[17]  David Wai-Lok Cheung,et al.  Efficient Mining of Association Rules in Distributed Databases , 1996, IEEE Trans. Knowl. Data Eng..

[18]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[19]  Jiong Yang,et al.  STING: A Statistical Information Grid Approach to Spatial Data Mining , 1997, VLDB.

[20]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[21]  Jennifer Widom,et al.  Clustering association rules , 1997, Proceedings 13th International Conference on Data Engineering.

[22]  Walid G. Aref Mining Association Rules in Large Databases , 2004 .

[23]  Geoffrey I. Webb Discovering associations with numeric variables , 2001, KDD '01.

[24]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[25]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[26]  Dimitrios Gunopulos,et al.  Automatic subspace clustering of high dimensional data for data mining applications , 1998, SIGMOD '98.