Mining Frequent Itemsets Using Re-Usable Data Structure

Several algorithms have been introduced for mining frequent itemsets. The recent datasettransformation approach suffers either from the possible increasing in the number of structures that could be produced through the execution of the algorithm or from the problem of the processing time in either projecting or decomposing the datasets. Moreover, the constructed structure cannot be re-used in ad-hoc mining queries or in other mining processes. In this paper, the ItemSet Tree (IST) structure is used in effectively counting the itemsets' support to overcome the above limitations. To speedup the support counting process, a proposal for using a Guidance Information Bits and tree size reduction is presented. The TDF algorithm will be proposed to find all the frequent itemsets. TDF explores the frequent itemsets search space in depth-first to generate candidates from the search space and count their support in the IST. Several experiments have been conducted to study the performance of the TDF algorithm.

[1]  Vijay V. Raghavan,et al.  Itemset Trees for Targeted Association Querying , 2003, IEEE Trans. Knowl. Data Eng..

[2]  Vijay V. Raghavan,et al.  The Item-Set Tree: A Data Structure for Data Mining , 1999, DaWaK.

[3]  Charu C. Aggarwal,et al.  A Tree Projection Algorithm for Generation of Frequent Item Sets , 2001, J. Parallel Distributed Comput..

[4]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Wesley W. Chu,et al.  A pattern decomposition (PD) algorithm for finding all frequent patterns in large datasets , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[8]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.