Mining frequent item sets by opportunistic projection

In this paper, we present a novel algorithm Opportune Project for mining complete set of frequent item sets by projecting databases to grow a frequent item set tree. Our algorithm is fundamentally different from those proposed in the past in that it opportunistically chooses between two different structures, array-based or tree-based, to represent projected transaction subsets, and heuristically decides to build unfiltered pseudo projection or to make a filtered copy according to features of the subsets. More importantly, we propose novel methods to build tree-based pseudo projections and array-based unfiltered projections for projected transaction subsets, which makes our algorithm both CPU time efficient and memory saving. Basically, the algorithm grows the frequent item set tree by depth first search, whereas breadth first search is used to build the upper portion of the tree if necessary. We test our algorithm versus several other algorithms on real world datasets, such as BMS-POS, and on IBM artificial datasets. The empirical results show that our algorithm is not only the most efficient on both sparse and dense databases at all levels of support threshold, but also highly scalable to very large databases.

[1]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[2]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[3]  Ron Kohavi,et al.  Real world performance of association rule algorithms , 2001, KDD '01.

[4]  Zvi M. Kedem,et al.  Pincer-Search: A New Algorithm for Discovering the Maximum Frequent Set , 1998, EDBT.

[5]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[6]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[7]  Hannu Toivonen,et al.  Sampling Large Databases for Association Rules , 1996, VLDB.

[8]  Jiawei Han,et al.  Discovery of Multiple-Level Association Rules from Large Databases , 1995, VLDB.

[9]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm for transactional databases , 2001, Proceedings 17th International Conference on Data Engineering.

[10]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[11]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[12]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[13]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[15]  Rajeev Motwani,et al.  Dynamic itemset counting and implication rules for market basket data , 1997, SIGMOD '97.

[16]  Ramesh C Agarwal,et al.  Depth first generation of long patterns , 2000, KDD '00.