Mining high utility itemsets using extended chain structure and utility machine

Abstract High utility itemsets are sets of items that have a high utility (e.g. a high profit or a high importance) in a transaction database. Discovering high utility itemsets has many important applications in real-life such as market basket analysis. Nonetheless, mining these patterns is a time-consuming process due to the huge search space and the high cost of utility computation. Most of previous work is devoted to search space pruning but pay little attention to utility computation. Factually, not only search space pruning but also high utility itemset identification have to resort to the computation of various utilities. This paper proposes a novel algorithm named REX (Rapid itEmset eXtraction), which extends the classic d 2 HUP algorithm with an improved structure, a k -item utility machine, and an efficient switch strategy. The structure can significantly reduce the time complexity of utility computation compared with the original structure used in d 2 HUP. The machine can quickly merge identical transactions and applies an efficient procedure for computing the utilities of extensions of a given itemset. The strategy derived from trial and error drastically gives rise to performance improvement on some databases and is also competitive with the switch strategy used in d 2 HUP on other databases. Experimental results show that REX achieves a speedup of from fifty percent to three orders of magnitude over d 2 HUP even though they use identical pruning techniques and that REX considerably outperforms state-of-the-art algorithms on real-life and synthetic databases.

[1]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[2]  Cory J. Butz,et al.  A Foundational Approach to Mining Itemset Utilities from Databases , 2004, SDM.

[3]  Lu Yang,et al.  Mining high-utility itemsets based on particle swarm optimization , 2016, Eng. Appl. Artif. Intell..

[4]  Unil Yun,et al.  Efficient High Utility Pattern Mining for Establishing Manufacturing Plans With Sliding Window Control , 2017, IEEE Transactions on Industrial Electronics.

[5]  Vincent S. Tseng,et al.  EFIM: a fast and memory efficient algorithm for high-utility itemset mining , 2016, Knowledge and Information Systems.

[6]  Christian Borgelt,et al.  Frequent item set mining , 2012, WIREs Data Mining Knowl. Discov..

[7]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[8]  Sebastián Ventura,et al.  Frequent itemset mining: A 25 years review , 2019, WIREs Data Mining Knowl. Discov..

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Srikumar Krishnamoorthy,et al.  HMiner: Efficiently mining high utility itemsets , 2017, Expert Syst. Appl..

[11]  Philippe Fournier-Viger,et al.  Efficient Algorithms for High Utility Itemset Mining Without Candidate Generation , 2019, Studies in Big Data.

[12]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[13]  Philippe Fournier-Viger,et al.  A Multi-Core Approach to Efficiently Mining High-Utility Itemsets in Dynamic Profit Databases , 2020, IEEE Access.

[14]  Heri Ramampiaro,et al.  Efficient high utility itemset mining using buffered utility-lists , 2017, Applied Intelligence.

[15]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[16]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[17]  Benjamin C. M. Fung,et al.  Mining High Utility Patterns in One Phase without Generating Candidates , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Philippe Fournier-Viger,et al.  Efficient high average-utility itemset mining using novel vertical weak upper-bounds , 2019, Knowl. Based Syst..

[19]  Chun-Wei Lin,et al.  Efficient approach of recent high utility stream pattern mining with indexed list structure and pruning strategy considering arrival times of transactions , 2020, Inf. Sci..

[20]  Lu Yang,et al.  A binary PSO approach to mine high-utility itemsets , 2017, Soft Comput..

[21]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..