Mining Frequent Itemsets Using Node-Sets of a Prefix-Tree

Frequent itemsets are important information about databases, and efficiently mining frequent itemsets is a core problem in data mining area. The divide-and-conquer strategy is very applicable to the problem. Most algorithms adopting the strategy construct a very large number of conditional databases when mining frequent itemsets. Representations of conditional databases and methods of constructing them greatly influence the performance of such algorithms. In this study, we propose a node-set structure for representing a conditional database, and develop a novel node-set-based algorithm, NS, for mining frequent itemsets. During a mining process, all the node-sets derive from a prefix-tree storing the complete frequent itemset information about the mined database. Compared with previous conditional database representations, node-sets are compact and contiguous on which NS can be performed fast. Constructing conditional databases involves counting for items. In NS, the counting procedure and the construction procedure are blended, which saves the time for scanning conditional databases, and further, the major operations of constructing conditional databases are very simple comparisons. Experimental data show that NS outperforms several famous algorithms including FPgrowth* and LCM, ones of the fastest algorithms, for various databases.

[1]  Hiroki Arimura,et al.  LCM ver. 2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets , 2004, FIMI.

[2]  John F. Roddick,et al.  Association mining , 2006, CSUR.

[3]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[4]  Hiroki Arimura,et al.  LCM ver.3: collaboration of array, bitmap and prefix tree for frequent itemset mining , 2005 .

[5]  Tobias Bjerregaard,et al.  A survey of research and practices of Network-on-chip , 2006, CSUR.

[6]  Tok Wang Ling,et al.  From Region Encoding To Extended Dewey: On Efficient Processing of XML Twig Pattern Matching , 2005, VLDB.

[7]  Lars Schmidt-Thieme,et al.  Algorithmic Features of Eclat , 2004, FIMI.

[8]  Wolfgang Lehner,et al.  Memory-efficient frequent-itemset mining , 2011, EDBT/ICDT '11.

[9]  Srinivasan Parthasarathy,et al.  Cache-conscious frequent pattern mining on modern and emerging processors , 2007, The VLDB Journal.

[10]  Anthony J. T. Lee,et al.  A data mining approach to face detection , 2010, Pattern Recognit..

[11]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[12]  Philip S. Yu,et al.  Direct Discriminative Pattern Mining for Effective Classification , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Hongjun Lu,et al.  Ascending frequency ordered prefix-tree: efficient mining of frequent patterns , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..

[15]  Divesh Srivastava,et al.  Holistic twig joins: optimal XML pattern matching , 2002, SIGMOD '02.

[16]  Jing-Rung Yu,et al.  FIUT: A new method for mining frequent itemsets , 2009, Inf. Sci..

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[18]  Jinlin Chen,et al.  BISC: A bitmap itemset support counting approach for efficient frequent itemset mining , 2010, TKDD.

[19]  Hongjun Lu,et al.  AFOPT: An Efficient Implementation of Pattern Growth Approach , 2003, FIMI.

[20]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[21]  Philip S. Yu,et al.  Clustering by pattern similarity in large data sets , 2002, SIGMOD '02.

[22]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[23]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[24]  J. Yu,et al.  Efficient Mining of Frequent Patterns Using Ascending Frequency Ordered Prefix-Tree , 2004, Data Mining and Knowledge Discovery.