UP-Hist Tree : efficient data structure for high utility pattern mining from transaction databases

High-utility itemset mining is an emerging research area in the field of Data Mining. Several algorithms were proposed to find highutility itemsets from transaction databases and use a data structure called UP-tree for their working. However, algorithms based on UP-tree generate a lot of candidates due to limited information availability in UP-tree for computing utility value estimates of itemsets. In this paper, we present a data structure named UP-Hist tree which maintains a histogram of item quantities with each node of the tree. The histogram allows computation of better utility estimates for effective pruning of the search space. Extensive experiments on real as well as synthetic datasets show that our algorithm based on UP-Hist tree outperforms the state of the art algorithms in terms of the total number of candidate high utility itemsets generated as well as total execution time. The UP-Hist tree takes low memory ranging from few KB’s to MB’s only.

[1]  Zhan Li,et al.  Knowledge and Information Systems , 2007 .

[2]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[3]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[4]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[5]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[6]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[7]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[8]  Longbing Cao,et al.  Efficiently Mining Top-K High Utility Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[10]  Yang Liu,et al.  An Algorithm of Top-k High Utility Itemsets Mining over Data Stream , 2014, J. Softw..

[11]  A. Choudhary,et al.  A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[12]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[13]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[14]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[15]  Philip S. Yu,et al.  Online mining of temporal maximal utility itemsets from data streams , 2010, SAC '10.