Mining High-Utility Itemsets with Multiple Minimum Utility Thresholds

High-utility itemset mining (HUIM) is an emerging topic in data mining. It consists of discovering high-utility itemsets (HUIs), i.e. groups of items (itemsets) that generate a high profit in transactional databases. Several algorithms have been proposed for this task. However, they suffer from an important limitation, which is to rely on a single minimum utility threshold as the sole criterion for identifying HUIs. In this paper, we address this issue by introducing the novel framework of HUIM with multiple minimum utility thresholds (HUIM-MMU). According to this framework, the user may specify different thresholds for each item, to discover HUIs. To perform HUIM-MMU, we first present an algorithm named HUI-MMU, which relies on a new sorted downward closure (SDC) property and least minimum utility threshold (LMU). Furthermore, an improved algorithm, namely HUI-MMUTID, is also proposed based on TID-index strategy, to increase mining performance. Substantial experiments both on real-life and synthetic datasets show that the two proposed algorithms can efficiently and effectively discover the complete set of HUIs in transactional databases while considering multiple minimum utility thresholds.

[1]  Tzung-Pei Hong,et al.  An effective tree structure for mining high utility itemsets , 2011, Expert Syst. Appl..

[2]  Keun Ho Ryu,et al.  Discovering high utility itemsets with multiple minimum supports , 2014, Intell. Data Anal..

[3]  Tzung-Pei Hong,et al.  Applying the maximum utility measure in high utility sequential pattern mining , 2014, Expert Syst. Appl..

[4]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[5]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Vincent S. Tseng,et al.  Discovering relational-based association rules with multiple minimum supports on microarray datasets , 2011, Bioinform..

[7]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[8]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[9]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[10]  Yen-Liang Chen,et al.  Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism , 2004, Decision Support Systems.

[11]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[12]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[13]  Tzung-Pei Hong,et al.  The Pre-FUFP algorithm for incremental mining , 2009, Expert Syst. Appl..

[14]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[15]  Tzung-Pei Hong,et al.  An efficient projection-based indexing approach for mining high utility itemsets , 2012, Knowledge and Information Systems.

[16]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[17]  Howard J. Hamilton,et al.  Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[18]  Tzung-Pei Hong,et al.  Discovery of high utility itemsets from on-shelf time periods of products , 2011, Expert Syst. Appl..

[19]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[20]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[21]  P. Krishna Reddy,et al.  Novel techniques to reduce search space in multiple minimum supports-based frequent pattern mining algorithms , 2011, EDBT/ICDT '11.