Fast mining frequent itemsets using Nodesets

Node-list and N-list, two novel data structure proposed in recent years, have been proven to be very efficient for mining frequent itemsets. The main problem of these structures is that they both need to encode each node of a PPC-tree with pre-order and post-order code. This causes that they are memory-consuming and inconvenient to mine frequent itemsets. In this paper, we propose Nodeset, a more efficient data structure, for mining frequent itemsets. Nodesets require only the pre-order (or post-order code) of each node, which makes it saves half of memory compared with N-lists and Node-lists. Based on Nodesets, we present an efficient algorithm called FIN to mining frequent itemsets. For evaluating the performance of FIN, we have conduct experiments to compare it with PrePost and FP-growth∗, two state-of-the-art algorithms, on a variety of real and synthetic datasets. The experimental results show that FIN is high performance on both running time and memory usage.

[1]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[2]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[3]  Bhatnagar Divya,et al.  Mining Frequent Itemsets without Candidate Generation using Optical Neural Network , 2011 .

[4]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Bay Vo,et al.  MEI: An efficient algorithm for mining erasable itemsets , 2014, Eng. Appl. Artif. Intell..

[6]  Frans Coenen,et al.  An Efficient Algorithm for Mining Erasable Itemsets Using the Difference of NC-Sets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[7]  J. Yu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[8]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[9]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[10]  Ee-Peng Lim,et al.  A support-ordered trie for fast frequent itemset discovery , 2004, IEEE Transactions on Knowledge and Data Engineering.

[11]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[12]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Zhihong Deng,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010 .

[15]  Zhi-Hong Deng,et al.  Fast mining Top-Rank-k frequent patterns by using Node-lists , 2014, Expert Syst. Appl..

[16]  Zhi-Hong Deng,et al.  Mining erasable itemsets , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[17]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[18]  Xiaoran Xu,et al.  Fast mining erasable itemsets using NC_sets , 2012, Expert Syst. Appl..

[19]  Anthony J. T. Lee,et al.  An efficient algorithm for mining closed inter-transaction itemsets , 2008, Data Knowl. Eng..

[20]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[21]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[22]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[23]  Xin Li,et al.  Mining frequent patterns from network flows for monitoring network , 2010, Expert Syst. Appl..

[24]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[25]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[26]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.