An Efficient Data Structure for Fast Mining High Utility Itemsets

In this paper, we propose a novel data structure called PUN-list, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, called MIP (Mining high utility Itemset using PUN-Lists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUN-list, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be fast constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUN-list is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average.

[1]  Byeong-Soo Jeong,et al.  Mining High Utility Web Access Sequences in Dynamic Web Log Data , 2010, 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[2]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[3]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[4]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[5]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[6]  M. H. Margahny,et al.  FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[7]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[8]  Zhi-Hong Deng,et al.  Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[9]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[10]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[11]  Zhonghui Wang,et al.  A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[12]  Raj P. Gopalan,et al.  Efficient Mining of High Utility Itemsets from Large Datasets , 2008, PAKDD.

[13]  Benjamin C. M. Fung,et al.  Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[14]  Longbing Cao,et al.  Efficiently Mining Top-K High Utility Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[15]  Gösta Grahne,et al.  Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Yue-Shi Lee,et al.  Mining High Utility Quantitative Association Rules , 2007, DaWaK.

[17]  Bhatnagar Divya,et al.  Mining Frequent Itemsets without Candidate Generation using Optical Neural Network , 2011 .

[18]  Mohammed J. Zaki,et al.  Fast vertical mining using diffsets , 2003, KDD '03.

[19]  Philip S. Yu,et al.  Efficient Mining of a Concise and Lossless Representation of High Utility Itemsets , 2011, 2011 IEEE 11th International Conference on Data Mining.

[20]  Philip S. Yu,et al.  Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[21]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[22]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[23]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[24]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[25]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[26]  Young-Koo Lee,et al.  HUC-Prune: an efficient candidate pruning technique to mine high utility patterns , 2011, Applied Intelligence.

[27]  Philip S. Yu,et al.  Online mining of temporal maximal utility itemsets from data streams , 2010, SAC '10.

[28]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[29]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[30]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[31]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.