Binary partition for itemsets expansion in mining high utility itemsets

High utility itemset mining has recently emerged to address the limitations of frequent itemset mining. It entails relevance measures to reflect both statistical significance and user expectations. Whether breadth-first or depth-first search algorithms are employed, most methods generate new candidates by 1-extension of existing itemsets (i.e., by adding only one item to verified itemsets to generate new potential candidates). As an alternative to 1-extension, we introduce an expansion method based on binary partition. We then define the transaction utility list and key support count and discuss a new pruning strategy. Based on the new itemset expansion method and pruning strategy, we propose an efficient high utility itemset mining algorithm called BPHUI-Mine (Binary Partition-based High Utility Itemsets Mine). Tests on publicly available datasets show that the proposed algorithm outperforms other state-of-the-art algorithms.

[1]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[2]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[3]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[4]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[5]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[6]  Howard J. Hamilton,et al.  Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[7]  Raj P. Gopalan,et al.  CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[8]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[9]  Bingru Yang,et al.  Index-CloseMiner: An improved algorithm for mining frequent closed itemset , 2008, Intell. Data Anal..

[10]  Young-Koo Lee,et al.  HUC-Prune: an efficient candidate pruning technique to mine high utility patterns , 2011, Applied Intelligence.

[11]  Luc De Raedt,et al.  Correlated itemset mining in ROC space: a constraint programming approach , 2009, KDD.

[12]  Frank S. C. Tseng,et al.  Mining fuzzy frequent itemsets for hierarchical document clustering , 2010, Inf. Process. Manag..

[13]  Jinlin Chen,et al.  BISC: A bitmap itemset support counting approach for efficient frequent itemset mining , 2010, TKDD.

[14]  Keun Ho Ryu,et al.  Approximate weighted frequent pattern mining with/without noisy environments , 2011, Knowl. Based Syst..

[15]  Jean-François Boulicaut,et al.  Closed and noise-tolerant patterns in n-ary relations , 2012, Data Mining and Knowledge Discovery.

[16]  Yu Liu,et al.  Vertical mining for high utility itemsets , 2012, 2012 IEEE International Conference on Granular Computing.

[17]  Ming-Yen Lin,et al.  High utility pattern mining using the maximal itemset property and lexicographic tree structures , 2012, Inf. Sci..

[18]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[19]  Lin Feng,et al.  UT-Tree: Efficient mining of high utility itemsets from data streams , 2013, Intell. Data Anal..

[20]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[21]  Yu Liu,et al.  Mining high utility itemsets by dynamically pruning the tree structure , 2013, Applied Intelligence.

[22]  Keun Ho Ryu,et al.  High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates , 2014, Expert Syst. Appl..

[23]  Albrecht Zimmermann,et al.  Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data , 2014, Intell. Data Anal..

[24]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[25]  Luca Cagliero,et al.  RIB: A Robust Itemset-based Bayesian approach to classification , 2014, Knowl. Based Syst..

[26]  Keun Ho Ryu,et al.  Discovering high utility itemsets with multiple minimum supports , 2014, Intell. Data Anal..

[27]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..