Mining correlated high-utility itemsets using various measures

Discovering high-utility itemsets consists of finding sets of items that yield a high profit in customer transaction databases. An important limitation of traditional high-utility itemset mining is that only the utility measure is used for assessing the interestingness of patterns. This leads to finding several itemsets that have a high profit but contain items that are weakly correlated. To address this issue, this paper proposes to integrate the concept of correlation in high-utility itemset mining to find profitable itemsets that are highly correlated, using the all-confidence and bond measures. An efficient algorithm named FCHM (Fast Correlated High-utility itemset Miner) is proposed to efficiently discover correlated high-utility itemsets. Two versions of the algorithm are proposed, named FCHMall-confidence and FCHMbond based on the allconfidence and bond measures, respectively. An experimental evaluation was done using four real-life benchmark datasets from the high-utility itemset mining litterature: mushroom, retail, kosarak and foodmart. Results show that FCHM is efficient and can prune a huge amount of weakly correlated high-utility itemsets.

[1]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[2]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[3]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[4]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[5]  Vincent S. Tseng,et al.  EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining , 2015, MICAI.

[6]  Chedy Raïssi,et al.  Mining Dominant Patterns in the Sky , 2011, 2011 IEEE 11th International Conference on Data Mining.

[7]  Sangkyum Kim,et al.  Mining Flipping Correlations from Large Datasets with Taxonomies , 2011, Proc. VLDB Endow..

[8]  Philippe Fournier-Viger,et al.  Mining Discriminative High Utility Patterns , 2016, ACIIDS.

[9]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[10]  Sadok Ben Yahia,et al.  Bridging Conjunctive and Disjunctive Search Spaces for Mining a New Concise and Exact Representation of Correlated Patterns , 2010, Discovery Science.

[11]  Ho-Jin Choi,et al.  A framework for mining interesting high utility patterns with a strong frequency affinity , 2011, Inf. Sci..

[12]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[13]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[14]  Vincent S. Tseng,et al.  Efficient Mining of High-Utility Sequential Rules , 2015, MLDM.

[15]  Sadok Ben Yahia,et al.  Key correlation mining by simultaneous monotone and anti-monotone constraints checking , 2015, SAC.

[16]  Yu Liu,et al.  BAHUI: Fast and Memory Efficient Mining of High Utility Itemsets Based on Bitmap , 2014, Int. J. Data Warehous. Min..

[17]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.