DiffNodesets: An efficient structure for fast mining frequent itemsets

Abstract Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.

[1]  Bay Vo,et al.  A survey of erasable itemset mining algorithms , 2014, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[2]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[3]  Zhi-Hong Deng,et al.  Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[4]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[5]  Bay Vo,et al.  An efficient and effective algorithm for mining top-rank-k frequent patterns , 2015, Expert Syst. Appl..

[6]  Keun Ho Ryu,et al.  Efficient frequent pattern mining based on Linear Prefix tree , 2014, Knowl. Based Syst..

[7]  Xiaoran Xu,et al.  Fast mining erasable itemsets using NC_sets , 2012, Expert Syst. Appl..

[8]  Heungmo Ryang,et al.  Mining weighted erasable patterns by using underestimated constraint-based pruning technique , 2015, J. Intell. Fuzzy Syst..

[9]  Ee-Peng Lim,et al.  A support-ordered trie for fast frequent itemset discovery , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Zhi-Hong Deng,et al.  Mining Top‐Rank‐k Erasable Itemsets by PID_lists , 2013, Int. J. Intell. Syst..

[11]  Tzung-Pei Hong,et al.  A CMFFP-tree algorithm to mine complete multiple fuzzy frequent itemsets , 2015, Appl. Soft Comput..

[12]  Anthony J. T. Lee,et al.  An efficient algorithm for mining closed inter-transaction itemsets , 2008, Data Knowl. Eng..

[13]  Xin Li,et al.  Mining frequent patterns from network flows for monitoring network , 2010, Expert Syst. Appl..

[14]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[15]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[16]  I-En Liao,et al.  A frequent itemset mining algorithm based on the Principle of Inclusion-Exclusion and transaction mapping , 2014, Inf. Sci..

[17]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[18]  Ahmad Abdollahzadeh Barforoush,et al.  Parallel frequent itemset mining using systolic arrays , 2013, Knowl. Based Syst..

[19]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[20]  Tzung-Pei Hong,et al.  A Hybrid Approach for Mining Frequent Itemsets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[21]  Alessandro Colantonio,et al.  EXPEDITE: EXPress closED ITemset Enumeration , 2015, Expert Syst. Appl..

[22]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[23]  Pablo Moscato,et al.  Disclosed: An efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data , 2014, Inf. Sci..

[24]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[25]  Xiaoran Xu,et al.  Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing , 2011, Fundam. Informaticae.

[26]  Zhi-Hong Deng,et al.  Fast mining Top-Rank-k frequent patterns by using Node-lists , 2014, Expert Syst. Appl..

[27]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[28]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[29]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[30]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[31]  Zhihong Deng,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010 .

[32]  Tzung-Pei Hong,et al.  Fast updated frequent-itemset lattice for transaction deletion , 2015, Data Knowl. Eng..

[33]  Tzung-Pei Hong,et al.  Fuzzy utility mining with upper-bound measure , 2015, Appl. Soft Comput..

[34]  Bay Vo,et al.  An N-list-based algorithm for mining frequent closed patterns , 2015, Expert Syst. Appl..

[35]  Zhi-Hong Deng,et al.  PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning , 2015, Expert Syst. Appl..

[36]  Luigi Troiano,et al.  Mining frequent itemsets in data streams within a time horizon , 2014, Data Knowl. Eng..

[37]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[38]  Bhatnagar Divya,et al.  Mining Frequent Itemsets without Candidate Generation using Optical Neural Network , 2011 .