A hybrid framework for mining high-utility itemsets in a sparse transaction database

High-utility itemset mining aims to find the set of items with utility no less than a user-defined threshold in a transaction database. High-utility itemset mining is an emerging research area in the field of data mining and has important applications in inventory management, query recommendation, systems operation research, bio-medical analysis, etc. Currently, known algorithms for this problem can be classified as either 1-phase or 2-phase algorithms. The 2-phase algorithms typically consist of tree-based algorithms which generate candidate high-utility itemsets and verify them later. A tree data structure generates candidate high-utility itemsets quickly by storing some upper bound utility estimate at each node. The 1-phase algorithms typically consist of inverted-list based and transaction projection based algorithms which avoid the generation of candidate high-utility itemsets. The inverted list and transaction projection allows computation of exact utility estimates. We propose a novel hybrid framework that combines a tree-based and an inverted-list based algorithm to efficiently mine high-utility itemsets. Algorithms based on the framework can harness benefits of both types of algorithms. We report experiment results on real and synthetic datasets to demonstrate the effectiveness of our framework.

[1]  Philip S. Yu,et al.  Online mining of temporal maximal utility itemsets from data streams , 2010, SAC '10.

[2]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[3]  Vikram Goyal,et al.  UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases , 2015, IDEAS.

[4]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[5]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[6]  Philip S. Yu,et al.  Mining High Utility Mobile Sequential Patterns in Mobile Commerce Environments , 2011, DASFAA.

[7]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[8]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[9]  Lan Vu,et al.  A Fast Algorithm Combining FP-Tree and TID-List for Frequent Pattern Mining , 2011 .

[10]  Longbing Cao,et al.  Efficiently Mining Top-K High Utility Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[11]  Bart Goethals,et al.  Advances in frequent itemset mining implementations: report on FIMI'03 , 2004, SKDD.

[12]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[13]  Vikram Goyal,et al.  An Efficient Algorithm for Mining High-Utility Itemsets with Discount Notion , 2015, BDA.

[14]  Raj P. Gopalan,et al.  Efficient Mining of High Utility Itemsets from Large Datasets , 2008, PAKDD.

[15]  Philip S. Yu,et al.  Efficient algorithms for mining maximal high utility itemsets from data streams with different models , 2012, Expert Syst. Appl..

[16]  Dhaval Patel,et al.  Top-K High Utility Episode Mining from a Complex Event Sequence , 2016, COMAD.

[17]  Keun Ho Ryu,et al.  High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates , 2014, Expert Syst. Appl..

[18]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[19]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[20]  Young-Koo Lee,et al.  HUC-Prune: an efficient candidate pruning technique to mine high utility patterns , 2011, Applied Intelligence.

[21]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[22]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[23]  Ho-Jin Choi,et al.  Interactive mining of high utility patterns over data streams , 2012, Expert Syst. Appl..

[24]  Ashish Sureka,et al.  High Utility Rare Itemset Mining over Transaction Databases , 2015, DNIS.

[25]  A. Choudhary,et al.  A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[26]  Philip S. Yu,et al.  An effective hash-based algorithm for mining association rules , 1995, SIGMOD '95.

[27]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[28]  Vincent S. Tseng,et al.  EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining , 2015, MICAI.

[29]  Zhan Li,et al.  Knowledge and Information Systems , 2007 .

[30]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[31]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[33]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[34]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[35]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.