论文信息 - Efficient single-pass frequent pattern mining using a prefix-tree

Efficient single-pass frequent pattern mining using a prefix-tree

The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan.

[1] Jia-Ling Koh,et al. An Efficient Approach for Maintaining Association Rules Based on Adjusting FP-Tree Structures1 , 2004, DASFAA.

[2] Guoqing Chen,et al. Fuzzy association rules and the extended mining algorithms , 2002, Inf. Sci..

[3] Anthony J. T. Lee,et al. Mining spatial association rules in image databases , 2007, Inf. Sci..

[4] Jian Pei,et al. Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[5] Yue-Shi Lee,et al. Incremental and interactive mining of web traversal patterns , 2008, Inf. Sci..

[6] Xindong Wu,et al. Association analysis with one scan of databases , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[7] Geert Wets,et al. Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[8] Graham Cormode,et al. What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[9] Zhan Li,et al. Knowledge and Information Systems , 2007 .

[10] Yuh-Jiuan Tsay,et al. An efficient cluster and decomposition algorithm for mining association rules , 2004, Inf. Sci..

[11] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[12] Hongjun Lu,et al. CFP-tree: A compact disk-based structure for storing and querying frequent itemsets , 2007, Inf. Syst..

[13] Rakesh Agarwal,et al. Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[14] Osmar R. Zaïane,et al. Incremental mining of frequent patterns without candidate generation or support constraint , 2003, Seventh International Database Engineering and Applications Symposium, 2003. Proceedings..

[15] Laks V. S. Lakshmanan,et al. Exploiting succinct constraints using FP-trees , 2002, SKDD.

[16] Gösta Grahne,et al. Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17] Laks V. S. Lakshmanan,et al. Mining frequent itemsets with convertible constraints , 2001, Proceedings 17th International Conference on Data Engineering.

[18] Suh-Yin Lee,et al. Interactive sequence discovery by incremental mining , 2004, Inf. Sci..

[19] Anthony J. T. Lee,et al. An efficient algorithm for mining frequent inter-transaction patterns , 2007, Inf. Sci..

[20] David Wai-Lok Cheung,et al. A General Incremental Technique for Maintaining Discovered Association Rules , 1997, DASFAA.

[21] Jiawei Han,et al. Frequent pattern mining: current status and future directions , 2007, Data Mining and Knowledge Discovery.

[22] Jiawei Han,et al. Efficient mining of partial periodic patterns in time series database , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[23] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[24] Necip Fazil Ayan,et al. An efficient algorithm to update large itemsets with early pruning , 1999, KDD '99.

[25] Xin Li,et al. A Fast Algorithm for Maintenance of Association Rules in Incremental Databases , 2006, ADMA.

[26] B. C. Brookes,et al. Information Sciences , 2020, Cognitive Skills You Need for the 21st Century.

[27] Hui Xiong,et al. Discovery of maximum length frequent itemsets , 2008, Inf. Sci..

[28] Qing Li,et al. From intra-transaction to generalized inter-transaction: Landscaping multidimensional contexts in association rule mining , 2005, Inf. Sci..

[29] Fan Wu,et al. A new approach to mine frequent patterns using item-transformation methods , 2007, Inf. Syst..

[30] Chia-Hui Chang,et al. Enhancing SWF for Incremental Association Mining by Itemset Maintenance , 2003, PAKDD.

[31] Chengqi Zhang,et al. EDUA: An efficient algorithm for dynamic database mining , 2007, Inf. Sci..

[32] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[33] Enhong Chen,et al. Efficient strategies for tough aggregate constraint-based sequential pattern mining , 2008, Inf. Sci..

[34] Ramesh C Agarwal,et al. Depth first generation of long patterns , 2000, KDD '00.

[35] Tzung-Pei Hong,et al. Incrementally fast updated frequent pattern trees , 2008, Expert Syst. Appl..

[36] Christopher J. Merz,et al. UCI Repository of Machine Learning Databases , 1996 .