Fast algorithms for frequent itemset mining using FP-trees

Efficient algorithms for mining frequent itemsets are crucial for mining association rules as well as for many other data mining tasks. Methods for mining frequent itemsets have been implemented using a prefix-tree structure, known as an FP-tree, for storing compressed information about frequent itemsets. Numerous experimental results have demonstrated that these algorithms perform extremely well. In this paper, we present a novel FP-array technique that greatly reduces the need to traverse FP-trees, thus obtaining significantly improved performance for FP-tree-based algorithms. Our technique works especially well for sparse data sets. Furthermore, we present new algorithms for mining all, maximal, and closed frequent itemsets. Our algorithms use the FP-tree data structure in combination with the FP-array technique efficiently and incorporate various optimization techniques. We also present experimental results comparing our methods with existing algorithms. The results show that our methods are the fastest for many cases. Even though the algorithms consume much memory when the data sets are sparse, they are still the fastest ones when the minimum support is low. Moreover, they are always among the fastest algorithms and consume less memory than other methods when the data sets are dense.

[1]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[2]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[3]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Heikki Mannila,et al.  Efficient Algorithms for Discovering Association Rules , 1994, KDD Workshop.

[5]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[6]  Ramakrishnan Srikant,et al.  Mining sequential patterns , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[7]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[8]  Jiawei Han,et al.  Metarule-Guided Mining of Multi-Dimensional Association Rules Using Data Cubes , 1997, KDD.

[9]  Philip S. Yu,et al.  Using a Hash-Based Method with Transaction Trimming for Mining Association Rules , 1997, IEEE Trans. Knowl. Data Eng..

[10]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[11]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[12]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[14]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[15]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Mohammed J. Zaki,et al.  Efficiently mining maximal frequent itemsets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[17]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[18]  Jiawei Han,et al.  Mining top-k frequent closed patterns without minimum support , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[19]  Wesley W. Chu,et al.  SmartMiner: a depth first algorithm guided by tail information for mining maximal frequent itemsets , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  Hongjun Lu,et al.  AFOPT: An Efficient Implementation of Pattern Growth Approach , 2003, FIMI.

[21]  Fabrizio Silvestri,et al.  kDCI: a Multi-Strategy Algorithm for Mining Frequent Sets , 2003, FIMI.

[22]  Andrea Pietracaprina,et al.  Mining Frequent Itemsets using Patricia Tries , 2003, FIMI.

[23]  G. Grahne,et al.  High Performance Mining of Maximal Frequent Itemsets Gösta , 2003 .

[24]  Christian Borgelt,et al.  EFFICIENT IMPLEMENTATIONS OF APRIORI AND ECLAT , 2003 .

[25]  Jiawei Han,et al.  Star-Cubing: Computing Iceberg Cubes by Top-Down and Bottom-Up Integration , 2003, Very Large Data Bases Conference.

[26]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[27]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[28]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[29]  Heikki Mannila,et al.  Levelwise Search and Borders of Theories in Knowledge Discovery , 1997, Data Mining and Knowledge Discovery.

[30]  Gösta Grahne,et al.  Mining frequent itemsets from secondary memory , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[31]  Timothy J. Purcell Sorting and searching , 2005, SIGGRAPH Courses.

[32]  Mohammed J. Zaki,et al.  Efficient algorithms for mining closed itemsets and their lattice structure , 2005, IEEE Transactions on Knowledge and Data Engineering.

[33]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).