论文信息 - A High-Performance Algorithm for Frequent Itemset Mining

A High-Performance Algorithm for Frequent Itemset Mining

Frequent itemsets, also called frequent patterns, are important information about databases, and mining efficiently frequent itemsets is a core problem in data mining area. Pattern growth approaches, such as the classic FP-Growth algorithm and the efficient FPgrowth* algorithm, can solve the problem. The approaches mine frequent itemsets by constructing recursively conditional databases that are usually represented by prefix-trees. The three major costs of such approaches are prefix-tree traversal, support counting, and prefix-tree construction. This paper presents a novel pattern growth algorithm called BFP-growth in which the three costs are greatly reduced. We compare the costs among BFP-growth, FP-Growth, and FPgrowth*, and illuminate that the costs of BFP-growth are the least. Experimental data show that BFP-growth outperforms not only FP-Growth and FPgrowth* but also several famous algorithms including dEclat and LCM, ones of the fastest algorithms, for various databases.

Mengchi Liu | Jun-Feng Qu | Mengchi Liu | Jun-Feng Qu

[1] Hongjun Lu,et al. AFOPT: An Efficient Implementation of Pattern Growth Approach , 2003, FIMI.

[2] Lars Schmidt-Thieme,et al. Algorithmic Features of Eclat , 2004, FIMI.

[3] Mohammed J. Zaki,et al. Fast vertical mining using diffsets , 2003, KDD '03.

[4] Mohammed J. Zaki. Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[5] Hiroki Arimura,et al. LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[6] John F. Roddick,et al. Association mining , 2006, CSUR.

[7] Srinivasan Parthasarathy,et al. Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[8] Hongjun Lu,et al. Ascending frequency ordered prefix-tree: efficient mining of frequent patterns , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[9] J. Yu,et al. Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[10] Gerd Stumme,et al. Mining frequent patterns with counting inference , 2000, SKDD.

[11] Tobias Bjerregaard,et al. A survey of research and practices of Network-on-chip , 2006, CSUR.

[12] Shamkant B. Navathe,et al. An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[13] Gösta Grahne,et al. Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14] Jian Pei,et al. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15] Hiroki Arimura,et al. LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[16] Tomasz Imielinski,et al. Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17] Philip S. Yu,et al. Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18] Jing-Rung Yu,et al. FIUT: A new method for mining frequent itemsets , 2009, Inf. Sci..

[19] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20] Philip S. Yu,et al. Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[21] Sanguthevar Rajasekaran,et al. A transaction mapping algorithm for frequent itemsets mining , 2006 .

[22] Wolfgang Lehner,et al. Memory-efficient frequent-itemset mining , 2011, EDBT/ICDT '11.