A High-Performance Algorithm for Frequent Itemset Mining

Frequent itemsets, also called frequent patterns, are important information about databases, and mining efficiently frequent itemsets is a core problem in data mining area. Pattern growth approaches, such as the classic FP-Growth algorithm and the efficient FPgrowth* algorithm, can solve the problem. The approaches mine frequent itemsets by constructing recursively conditional databases that are usually represented by prefix-trees. The three major costs of such approaches are prefix-tree traversal, support counting, and prefix-tree construction. This paper presents a novel pattern growth algorithm called BFP-growth in which the three costs are greatly reduced. We compare the costs among BFP-growth, FP-Growth, and FPgrowth*, and illuminate that the costs of BFP-growth are the least. Experimental data show that BFP-growth outperforms not only FP-Growth and FPgrowth* but also several famous algorithms including dEclat and LCM, ones of the fastest algorithms, for various databases.

[1]  Hongjun Lu,et al.  AFOPT: An Efficient Implementation of Pattern Growth Approach , 2003, FIMI.

[2]  Lars Schmidt-Thieme,et al.  Algorithmic Features of Eclat , 2004, FIMI.

[3]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[4]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[5]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[6]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[7]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[8]  Hongjun Lu,et al.  Ascending frequency ordered prefix-tree: efficient mining of frequent patterns , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[9]  J. Yu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[10]  Gerd Stumme,et al.  Mining frequent patterns with counting inference , 2000, SKDD.

[11]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[12]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[13]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[14]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[15]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[16]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[17]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[18]  Jing-Rung Yu,et al.  FIUT: A new method for mining frequent itemsets , 2009, Inf. Sci..

[19]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[20]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[21]  Sanguthevar Rajasekaran,et al.  A transaction mapping algorithm for frequent itemsets mining , 2006 .

[22]  Wolfgang Lehner,et al.  Memory-efficient frequent-itemset mining , 2011, EDBT/ICDT '11.