论文信息 - An Efficient Approach for Interactive Mining of Frequent Itemsets

An Efficient Approach for Interactive Mining of Frequent Itemsets

There have been many studies on efficient discovery of frequent itemsets in large databases. However, it is nontrivial to mine frequent itemsets under interactive circumstances where users often change minimum support threshold (minsup) because the change of minsup may invalidate existing frequent itemsets or introduce new frequent itemsets. In this paper, we propose an efficient interactive mining technique based on a novel vertical itemset tree (VI-tree) structure. An important feature of our algorithm is that it does not have to re-examine the existing frequent itemsets when minsup becomes small. Such feature makes it very efficient for interactive mining. The algorithm we proposed has been implemented and its performance is compared with re-running Eclat, a vertical mining algorithm, under different minsup. Experimental results show that our algorithm is over two orders of magnitude faster than the latter in average.

Xin Li | Shiwei Tang | Zhi-Hong Deng

[1] Devavrat Shah,et al. Turbo-charging vertical mining of large databases , 2000, SIGMOD 2000.

[2] Johannes Gehrke,et al. MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[3] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[4] Wynne Hsu,et al. Integrating Classification and Association Rule Mining , 1998, KDD.

[5] Nandit Soparkar,et al. Data organization and access for efficient data mining , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[6] Philip S. Yu,et al. An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[7] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[8] Shiwei Tang,et al. Towards Efficient Re-mining of Frequent Patterns upon Threshold Changes , 2002, WAIM.

[9] Mohammed J. Zaki,et al. Fast vertical mining using diffsets , 2003, KDD '03.

[10] Jiming Liu,et al. Towards Efficient Data Re-mining (DRM) , 2001, PAKDD.

[11] Jiawei Han,et al. Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[12] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[13] Mohammed J. Zaki,et al. CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[14] Roberto J. Bayardo,et al. Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[15] Rajeev Motwani,et al. Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[16] Meng Li,et al. Stream Operators for Querying Data Streams , 2005, WAIM.

[17] Mohammed J. Zaki. Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[18] Jian Pei,et al. CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[19] Ramakrishnan Srikant,et al. Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[20] Rajeev Motwani,et al. Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.