FDHUP: Fast algorithm for mining discriminative high utility patterns

Recently, high utility pattern mining (HUPM) has been extensively studied. Many approaches for HUPM have been proposed in recent years, but most of them aim at mining HUPs without any consideration for their frequency. This has the major drawback that any combination of a low utility item with a very high utility pattern is regarded as a HUP, even if this combination has low affinity and contains items that rarely co-occur. Thus, frequency should be a key criterion to select HUPs. To address this issue, and derive high utility interesting patterns (HUIPs) with strong frequency affinity, the HUIPM algorithm was proposed. However, it recursively constructs a series of conditional trees to produce candidates and then derive the HUIPs. This procedure is time-consuming and may lead to a combinatorial explosion when the minimum utility threshold is set relatively low. In this paper, an efficient algorithm named fast algorithm for mining discriminative high utility patterns (DHUPs) with strong frequency affinity (FDHUP) is proposed to efficiently discover DHUPs by considering both the utility and frequency affinity constraints. Two compact structures named EI-table and FU-tree and three pruning strategies are introduced in the proposed algorithm to reduce the search space, and efficiently and effectively discover DHUPs. An extensive experimental study shows that the proposed FDHUP algorithm considerably outperforms the state-of-the-art HUIPM algorithm in terms of execution time, memory consumption, and scalability.

[1]  Tzung-Pei Hong,et al.  Efficient algorithms for mining up-to-date high-utility patterns , 2015, Adv. Eng. Informatics.

[2]  Tzung-Pei Hong,et al.  An efficient algorithm to mine high average-utility itemsets , 2016, Adv. Eng. Informatics.

[3]  Jiawei Han,et al.  CCMine: Efficient Mining of Confidence-Closed Correlated Patterns , 2004, PAKDD.

[4]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[5]  Howard J. Hamilton,et al.  Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[6]  Tzung-Pei Hong,et al.  Efficient algorithms for mining high-utility itemsets in uncertain databases , 2016, Knowl. Based Syst..

[7]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[8]  Edward Omiecinski,et al.  Alternative Interest Measures for Mining Associations in Databases , 2003, IEEE Trans. Knowl. Data Eng..

[9]  Hui Xiong,et al.  Mining strong affinity association patterns in data sets with skewed support distribution , 2003, Third IEEE International Conference on Data Mining.

[10]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[11]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[12]  Ron Rymon,et al.  Search through Systematic Set Enumeration , 1992, KR.

[13]  Tzung-Pei Hong,et al.  Discovery of high utility itemsets from on-shelf time periods of products , 2011, Expert Syst. Appl..

[14]  Philip S. Yu,et al.  Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets , 2015, IEEE Transactions on Knowledge and Data Engineering.

[15]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Tzung-Pei Hong,et al.  An effective tree structure for mining high utility itemsets , 2011, Expert Syst. Appl..

[17]  Tomasz Imielinski,et al.  Database Mining: A Performance Perspective , 1993, IEEE Trans. Knowl. Data Eng..

[18]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[19]  Philippe Fournier-Viger,et al.  FOSHU: faster on-shelf high utility itemset mining -- with or without negative unit profit , 2015, SAC.

[20]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[21]  Ho-Jin Choi,et al.  A framework for mining interesting high utility patterns with a strong frequency affinity , 2011, Inf. Sci..

[22]  Tzung-Pei Hong,et al.  A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification , 2015, Adv. Eng. Informatics.

[23]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[24]  Vincent S. Tseng,et al.  Mining high-utility itemsets with various discount strategies , 2015, DSAA.

[25]  Tzung-Pei Hong,et al.  Mining Fuzzy Multiple-Level Association Rules from Quantitative Data , 2004, Applied Intelligence.

[26]  Alicia Troncoso Lora,et al.  Selecting the best measures to discover quantitative association rules , 2014, Neurocomputing.

[27]  Tzung-Pei Hong,et al.  Incrementally Updating High-Utility Itemsets with Transaction Insertion , 2014, ADMA.

[28]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[29]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[30]  Tzung-Pei Hong,et al.  Mining high-utility itemsets with various discount strategies , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[31]  Tzung-Pei Hong,et al.  Mining association rules from quantitative data , 1999, Intell. Data Anal..

[32]  Tzung-Pei Hong,et al.  On-shelf utility mining with negative item values , 2014, Expert Syst. Appl..

[33]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[34]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[35]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[36]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[37]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.