论文信息 - An efficient structure for fast mining high utility itemsets

An efficient structure for fast mining high utility itemsets

High utility itemset mining has emerged to be an important research issue in data mining since it has a wide range of real life applications. Although a number of algorithms have been proposed in recent years, the mining efficiency is still a big challenge since these algorithms suffer from either the problem of low efficiency of calculating candidates’ utilities or the problem of generating huge number of candidates. In this paper, we propose a novel data structure named PUN-list (PU-tree-Node list), which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUN-lists, we present a method, named MIP (Mining high utility Itemset using PUN-Lists), for efficiently mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, named PUN-list, which avoids costly and repeated utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUN-list of the itemset and the PUN-lists of long itemsets can be efficiently constructed by the PUN-lists of short itemsets. Third, by employing the utility upper bound lying in the PUN-lists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, named set-enumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that MIP is very efficient since it is much faster than HUI-Miner, d2HUP, and UP-Growth + , especially on dense datasets.

Zhi-Hong Deng | Zhihong Deng

[1] Philip S. Yu,et al. Mining top-K high utility itemsets , 2012, KDD.

[2] M. H. Margahny,et al. FAST ALGORITHM FOR MINING ASSOCIATION RULES , 2014 .

[3] Zhi-Hong Deng,et al. PrePost+: An efficient N-lists-based algorithm for mining frequent itemsets via Children-Parent Equivalence pruning , 2015, Expert Syst. Appl..

[4] Philip S. Yu,et al. Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[5] Yue-Shi Lee,et al. Mining High Utility Quantitative Association Rules , 2007, DaWaK.

[6] Ron Rymon,et al. Search through Systematic Set Enumeration , 1992, KR.

[7] Keun Ho Ryu,et al. High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates , 2014, Expert Syst. Appl..

[8] Chin-Chen Chang,et al. Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[9] Srikumar Krishnamoorthy,et al. Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[10] Gösta Grahne,et al. Fast algorithms for frequent itemset mining using FP-trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[11] Cory J. Butz,et al. A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[12] Zhonghui Wang,et al. A new algorithm for fast mining frequent itemsets using N-lists , 2012, Science China Information Sciences.

[13] Tzung-Pei Hong,et al. A Hybrid Approach for Mining Frequent Itemsets , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[14] Young-Koo Lee,et al. Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15] Philip S. Yu,et al. UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[16] Tzung-Pei Hong,et al. An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[17] Young-Koo Lee,et al. HUC-Prune: an efficient candidate pruning technique to mine high utility patterns , 2011, Applied Intelligence.

[18] Bhatnagar Divya,et al. Mining Frequent Itemsets without Candidate Generation using Optical Neural Network , 2011 .

[19] Mohammed J. Zaki,et al. Fast vertical mining using diffsets , 2003, KDD '03.

[20] Das Amrita,et al. Mining Association Rules between Sets of Items in Large Databases , 2013 .

[21] Raj P. Gopalan,et al. Efficient Mining of High Utility Itemsets from Large Datasets , 2008, PAKDD.

[22] Zhihong Deng,et al. A New Fast Vertical Method for Mining Frequent Patterns , 2010 .

[23] Benjamin C. M. Fung,et al. Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[24] Heungmo Ryang,et al. Incremental high utility pattern mining with static and dynamic databases , 2014, Applied Intelligence.

[25] Zhi-Hong Deng,et al. Fast mining frequent itemsets using Nodesets , 2014, Expert Syst. Appl..

[26] Suh-Yin Lee,et al. Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[27] Tzung-Pei Hong,et al. Mining frequent itemsets using the N-list and subsume concepts , 2014, Int. J. Mach. Learn. Cybern..

[28] Mengchi Liu,et al. Mining high utility itemsets without candidate generation , 2012, CIKM.

[29] Philip S. Yu,et al. Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[30] Qiang Yang,et al. Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[31] Hongjun Lu,et al. H-mine: hyper-structure mining of frequent patterns in large databases , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[32] Vikram Goyal,et al. UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases , 2015, IDEAS.

[33] Ying Liu,et al. A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[34] Byeong-Soo Jeong,et al. Mining High Utility Web Access Sequences in Dynamic Web Log Data , 2010, 2010 11th ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing.

[35] Ramakrishnan Srikant,et al. Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[36] Zhi-Hong Deng,et al. DiffNodesets: An efficient structure for fast mining frequent itemsets , 2015, Appl. Soft Comput..