A Space Optimization for FP-Growth

Frequency mining problem comprises the core of several data mining algorithms. Among frequent pattern discovery algorithms, FP-GROWTH employs a unique search strategy using compact structures resulting in a high performance algorithm that requires only two database passes. We introduce an enhanced version of this algorithm called FP-GROWTH-TINY which can mine larger databases due to a space optimization eliminating the need for intermediate conditional pattern bases. We present the algorithms required for directly constructing a conditional FP-Tree in detail. The experiments demonstrate that our implementation has a running time performance comparable to the original algorithm while reducing memory use up to twofold.

[1]  Geert Wets,et al.  Using association rules for product assortment decisions: a case study , 1999, KDD '99.

[2]  Ulrich Güntzer,et al.  Algorithms for association rule mining — a general survey and comparison , 2000, SKDD.

[3]  Srinivasan Parthasarathy,et al.  A localized algorithm for parallel association mining , 1997, SPAA '97.

[4]  Srinivasan Parthasarathy,et al.  Parallel Algorithms for Discovery of Association Rules , 1997, Data Mining and Knowledge Discovery.

[5]  Gösta Grahne,et al.  Efficiently Using Prefix-trees in Mining Frequent Itemsets , 2003, FIMI.

[6]  Wojciech Szpankowski,et al.  Summary structures for frequency queries on large transaction sets , 2000, Proceedings DCC 2000. Data Compression Conference.

[7]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[8]  Rakesh Agrawal,et al.  Parallel Mining of Association Rules , 1996, IEEE Trans. Knowl. Data Eng..

[9]  Tom Brijs,et al.  Profiling high frequency accident locations using associations rules , 2002 .

[10]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[11]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[12]  Andrea Pietracaprina,et al.  Mining Frequent Itemsets using Patricia Tries , 2003, FIMI.

[13]  Bart Goethals,et al.  FIMI'03: Workshop on Frequent Itemset Mining Implementations , 2003 .

[14]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[15]  Bart Goethals,et al.  Memory issues in frequent itemset mining , 2004, SAC '04.

[16]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[17]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[18]  Andreas Mueller,et al.  Fast sequential and parallel algorithms for association rule mining: a comparison , 1995 .