Mining recent high average utility patterns based on sliding window from stream data

Utility pattern mining is a technique that finds valuable patterns from large-sized databases with each item's importance and quantity information associated with it. The representative utility pattern mining technique, high utility pattern mining (HUPM), calculates the utilities of patterns by summating all of the item utilities in the patterns. However, such utility measures for patterns in HUPM have a drawback in whichpatterns with long lengths tend to have utilities sufficient to become high utility patterns. For these reasons, high average utility pattern mining (HAUPM) employing different utility measures has been studied in order to consider such pattern length factors. Recently, techniques for handling stream data are necessary because many data sources, e.g. sensors and POS devices, produce data in real time. However, all the existing HAUPM algorithms are unable to find up-to-date, meaningful patterns over data streams. We thus propose the first sliding window based HAUPM algorithm discovering recent high average utility patterns over data streams. Based on the sliding window model, our algorithm divides stream data into numerous batches, and keeps only recent batches in its window. Thereby, the algorithm can mine recent, important patterns over data streams. We also introduce a new strategy that enhances the performance of our algorithm by minimizing the overestimated average utilities stored in the proposed data structure. The experimental results show that our algorithm outperforms the competitors.

[1]  Keun Ho Ryu,et al.  Mining maximal frequent patterns by considering weight conditions over data streams , 2014, Knowl. Based Syst..

[2]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[3]  Heungmo Ryang,et al.  Incremental high utility pattern mining with static and dynamic databases , 2014, Applied Intelligence.

[4]  D. Cheung,et al.  Maintenance of Discovered Association Rules , 2002 .

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  Hailin Li On-line and dynamic time warping for time series data mining , 2015, Int. J. Mach. Learn. Cybern..

[7]  Keun Ho Ryu,et al.  Efficient frequent pattern mining based on Linear Prefix tree , 2014, Knowl. Based Syst..

[8]  Ho-Jin Choi,et al.  Interactive mining of high utility patterns over data streams , 2012, Expert Syst. Appl..

[9]  Hui Chen,et al.  Mining frequent patterns in a varying-size sliding window of online transactional data streams , 2012, Inf. Sci..

[10]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[11]  Suh-Yin Lee,et al.  DSM-FI: an efficient algorithm for mining frequent itemsets in data streams , 2008, Knowledge and Information Systems.

[12]  Xuejun Liu,et al.  Mining frequent closed itemsets from a landmark window over online data streams , 2009, Comput. Math. Appl..

[13]  Won Suk Lee,et al.  Finding recently frequent itemsets adaptively over online transactional data streams, , 2006, Inf. Syst..

[14]  Hadi Sadoghi Yazdi,et al.  Online neural network model for non-stationary and imbalanced data stream classification , 2014, Int. J. Mach. Learn. Cybern..

[15]  Heungmo Ryang,et al.  Top-k high utility pattern mining with effective threshold raising strategies , 2015, Knowl. Based Syst..

[16]  Heungmo Ryang,et al.  Mining weighted erasable patterns by using underestimated constraint-based pruning technique , 2015, J. Intell. Fuzzy Syst..

[17]  Ling Chen,et al.  Mining frequent items in data stream using time fading model , 2014, Inf. Sci..

[18]  Tzung-Pei Hong,et al.  Efficiently Mining High Average-Utility Itemsets with an Improved Upper-Bound Strategy , 2012, Int. J. Inf. Technol. Decis. Mak..

[19]  Keun Ho Ryu,et al.  Discovering high utility itemsets with multiple minimum supports , 2014, Intell. Data Anal..

[20]  Unil Yun,et al.  Mining top-k frequent patterns with combination reducing techniques , 2013, Applied Intelligence.

[21]  Heungmo Ryang,et al.  An uncertainty-based approach: Frequent itemset mining from uncertain data with different item importance , 2015, Knowl. Based Syst..

[22]  Tzung-Pei Hong,et al.  A New Method for Mining High Average Utility Itemsets , 2014, CISIM.

[23]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[24]  Keun Ho Ryu,et al.  High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates , 2014, Expert Syst. Appl..

[25]  Tzung-Pei Hong,et al.  A Projection-Based Approach for Discovering High Average-Utility Itemsets , 2012, J. Inf. Sci. Eng..

[26]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[27]  Heungmo Ryang,et al.  Multiple Minimum Support-Based Rare Graph Pattern Mining Considering Symmetry Feature-Based Growth Technique and the Differing Importance of Graph Elements , 2015, Symmetry.

[28]  Keun Ho Ryu,et al.  Sliding window based weighted maximal frequent pattern mining over data streams , 2014, Expert Syst. Appl..

[29]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[30]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[31]  Tzung-Pei Hong,et al.  An Incremental Mining Algorithm for High Average-Utility Itemsets , 2009, 2009 10th International Symposium on Pervasive Systems, Algorithms, and Networks.

[32]  Tzung-Pei Hong,et al.  Effective utility mining with the measure of average utility , 2011, Expert Syst. Appl..

[33]  Young-Koo Lee,et al.  Sliding window-based frequent pattern mining over data streams , 2009, Inf. Sci..