negFIN: An efficient algorithm for fast mining frequent itemsets

Abstract Frequent itemset mining is a basic data mining task and has numerous applications in other data mining tasks. In recent years, some data structures based on sets of nodes in a prefix tree have been presented. These data structures store essential information about frequent itemsets. In this paper, we propose another efficient data structure, NegNodeset. Similar to other such data structures, the basis of NegNodeset is sets of nodes in a prefix tree. NegNodeset employs a novel encoding model for nodes in a prefix tree based on the bitmap representation of sets. Based on the NegNodeset data structure, we propose negFIN, which is an efficient algorithm for frequent itemset mining. The efficiency of the negFIN algorithm is confirmed by the following three reasons: (1) the NegNodesets of itemsets are extracted using bitwise operators, (2) the complexity of calculating NegNodesets and counting supports is reduced to O(n), where n is the cardinality of NegNodeset, and (3) it employs a set-enumeration tree to generate frequent itemsets and uses a promotion method to prune the search space in this tree. Our extensive performance study on a variety of benchmark datasets indicates that negFIN is the fastest algorithm, compared with previous state-of-the-art algorithms. However, our algorithm runs with the same speed as dFIN on some datasets.

[1]  Zhi-Hong Deng,et al.  PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning , 2015, Expert Syst. Appl..

[2]  Luigi Troiano,et al.  Mining frequent itemsets in data streams within a time horizon , 2014, Data Knowl. Eng..

[3]  Xiao Qin,et al.  FiDoop-DP: Data Partitioning in Frequent Itemset Mining on Hadoop Clusters , 2017, IEEE Transactions on Parallel and Distributed Systems.

[4]  Pablo Moscato,et al.  Disclosed: An efficient depth-first, top-down algorithm for mining disjunctive closed itemsets in high-dimensional data , 2014, Inf. Sci..

[5]  Hongjun Lu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.

[6]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Ferenc Kovacs,et al.  Frequent itemset mining on hadoop , 2013, 2013 IEEE 9th International Conference on Computational Cybernetics (ICCC).

[8]  Shamkant B. Navathe,et al.  An Efficient Algorithm for Mining Association Rules in Large Databases , 1995, VLDB.

[9]  Zhi-Hong Deng,et al.  An efficient structure for fast mining high utility itemsets , 2018, Applied Intelligence.

[10]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[11]  Bay Vo,et al.  An N-list-based algorithm for mining frequent closed patterns , 2015, Expert Syst. Appl..

[12]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD '00.

[13]  Ahmad Abdollahzadeh Barforoush,et al.  Parallel frequent itemset mining using systolic arrays , 2013, Knowl. Based Syst..

[14]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[15]  Tzung-Pei Hong,et al.  Fast updated frequent-itemset lattice for transaction deletion , 2015, Data Knowl. Eng..

[16]  Bay Vo,et al.  An efficient and effective algorithm for mining top-rank-k frequent patterns , 2015, Expert Syst. Appl..

[17]  Tzung-Pei Hong,et al.  Fuzzy utility mining with upper-bound measure , 2015, Appl. Soft Comput..

[18]  Tzung-Pei Hong,et al.  A Hybrid Approach for Mining Frequent Itemsets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[19]  Tzung-Pei Hong,et al.  A CMFFP-tree algorithm to mine complete multiple fuzzy frequent itemsets , 2015, Appl. Soft Comput..

[20]  Anthony J. T. Lee,et al.  An efficient algorithm for mining closed inter-transaction itemsets , 2008, Data Knowl. Eng..

[21]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[22]  Jian Pei,et al.  CLOSET+: searching for the best strategies for mining frequent closed itemsets , 2003, KDD '03.

[23]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[25]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD 2000.

[26]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[27]  Johannes Gehrke,et al.  MAFIA: a maximal frequent itemset algorithm , 2005, IEEE Transactions on Knowledge and Data Engineering.

[28]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[29]  Ee-Peng Lim,et al.  A support-ordered trie for fast frequent itemset discovery , 2004, IEEE Transactions on Knowledge and Data Engineering.

[30]  Xiaoran Xu,et al.  Mop: An Efficient Algorithm for Mining Frequent Pattern with Subtree Traversing , 2011, Fundam. Informaticae.

[31]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[32]  Zhonghui Wang,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010, Int. J. Comput. Intell. Syst..

[33]  Hongjun Lu,et al.  H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[34]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[35]  Toon Calders,et al.  Mining Frequent Itemsets in a Stream , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[36]  Zhihong Deng,et al.  A New Fast Vertical Method for Mining Frequent Patterns , 2010 .

[37]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[38]  Roberto J. Bayardo,et al.  Efficiently mining long patterns from databases , 1998, SIGMOD '98.

[39]  Zhi-Hong Deng,et al.  DiffNodesets: An efficient structure for fast mining frequent itemsets , 2015, Appl. Soft Comput..

[40]  Zhi-Hong Deng,et al.  Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[41]  Xin Li,et al.  Mining frequent patterns from network flows for monitoring network , 2010, Expert Syst. Appl..

[42]  Bay Vo,et al.  A survey of erasable itemset mining algorithms , 2014, Wiley Interdiscip. Rev. Data Min. Knowl. Discov..

[43]  Alessandro Colantonio,et al.  EXPEDITE: EXPress closED ITemset Enumeration , 2015, Expert Syst. Appl..

[44]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[45]  Devavrat Shah,et al.  Turbo-charging vertical mining of large databases , 2000, SIGMOD 2000.

[46]  Zhi-Hong Deng,et al.  Fast mining Top-Rank-k frequent patterns by using Node-lists , 2014, Expert Syst. Appl..

[47]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.