SOHUPDS: a single-pass one-phase algorithm for mining high utility patterns over a data stream

High utility pattern mining has emerged to overcome the limitation of frequent pattern mining where only frequency is taken as importance without considering the actual importance of items. Existing algorithms for mining high utility patterns over a data stream are two-phase algorithms that are not scalable due to the large number of candidates generation in the first phase, particularly when the minimum utility threshold is low. Moreover, in the second phase, the algorithm needs to scan the database again to find out actual utility for candidates. In this paper, we propose a novel algorithm SOHUPDS to mine high utility patterns over a data stream with the sliding window technique using the projected database approach. In addition, we propose a data structure IUDataListSW, which stores utility and upper-bound values of the items in the current sliding window. Moreover, IUDataListSW stores position of items in the transaction to get the initial projected database of items efficiently. Furthermore, we propose an update strategy to utilize mined high utility patterns from the previous sliding window to update high utility patterns in the current sliding window. Therefore, SOHUPDS is able to mine high utility patterns over a data stream in a single pass and one phase. Experimental results illustrate that SOHUPDS is more efficient than the state-of-the-art algorithms in terms of execution time as well as memory usage.

[1]  Ho-Jin Choi,et al.  Interactive mining of high utility patterns over data streams , 2012, Expert Syst. Appl..

[2]  A. Choudhary,et al.  A fast high utility itemsets mining algorithm , 2005, UBDM '05.

[3]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[4]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[5]  Heungmo Ryang,et al.  High utility pattern mining over data streams with sliding window technique , 2016, Expert Syst. Appl..

[6]  Jen-Wei Huang,et al.  DMHUPS: Discovering Multiple High Utility Patterns Simultaneously , 2018, Knowledge and Information Systems.

[7]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[8]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[9]  Aijun An,et al.  Mining top-k high utility patterns over data streams , 2014, Inf. Sci..

[10]  Vikram Goyal,et al.  Mining top-k high-utility itemsets from a data stream under sliding window model , 2017, Applied Intelligence.

[11]  Philip S. Yu,et al.  Efficient algorithms for mining maximal high utility itemsets from data streams with different models , 2012, Expert Syst. Appl..

[12]  Ashok Kumar Das,et al.  An efficient approach for mining association rules from high utility itemsets , 2015, Expert Syst. Appl..

[13]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[14]  Vincent S. Tseng,et al.  EFIM: A Highly Efficient Algorithm for High-Utility Itemset Mining , 2015, MICAI.

[15]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Vikram Goyal,et al.  UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases , 2015, IDEAS.

[17]  Benjamin C. M. Fung,et al.  Mining High Utility Patterns in One Phase without Generating Candidates , 2016, IEEE Transactions on Knowledge and Data Engineering.

[18]  Benjamin C. M. Fung,et al.  Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.

[19]  Vincent S. Tseng,et al.  Efficient Mining of Temporal High Utility Itemsets from Data streams , 2006 .

[20]  Keun Ho Ryu,et al.  High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates , 2014, Expert Syst. Appl..

[21]  Antonio Gomariz,et al.  SPMF: a Java open-source pattern mining library , 2014, J. Mach. Learn. Res..

[22]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.