Mining top-k high-utility itemsets from a data stream under sliding window model

High-utility itemset mining has gained significant attention in the past few years. It aims to find sets of items i.e. itemsets from a database with utility no less than a user defined threshold. The notion of utility provides more flexibility to an analyst to mine relevant itemsets. Nowadays, a continuous and unbounded stream of data is generated from web-clicks, transaction flow from retail stores, sensor networks, etc. Mining high-utility itemsets from a data stream is a challenging task as the incoming stream of data has to be processed on the fly with time and storage memory constraints. The number of high-utility itemsets depends on the user-defined threshold. A large number of itemsets can be generated at very low threshold values and vice versa. It can be a tedious task to set a threshold value to get a reasonable number of itemsets. Top-k high-utility itemset mining was coined to address this issue. k is the number of high-utility itemsets in the result set as defined by the user. In this paper, we propose a data structure and an efficient algorithm for mining top-k high-utility itemsets from a data stream. The algorithm has a single phase that does not generate any candidates, unlike many algorithms that work in two phases, i.e., candidate generation followed by candidates verification. We conduct extensive experiments on several real and synthetic datasets. Experimental results demonstrate that our proposed algorithm performs 20 to 80 times better on sparse datasets and 300 to 700 times on dense datasets than the state-of-the-art algorithm in terms of computation time. Furthermore, our proposed algorithm requires less memory compared to the state-of-the-art algorithm.

[1]  Carson Kai-Sang Leung,et al.  Frequent itemset mining of uncertain data streams using the damped window model , 2011, SAC.

[2]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[3]  Heungmo Ryang,et al.  High utility pattern mining over data streams with sliding window technique , 2016, Expert Syst. Appl..

[4]  Vincent S. Tseng,et al.  Efficient Mining of Temporal High Utility Itemsets from Data streams , 2006 .

[5]  Ashish Sureka,et al.  High Utility Rare Itemset Mining over Transaction Databases , 2015, DNIS.

[6]  Philip S. Yu,et al.  Efficient Algorithms for Mining Top-K High Utility Itemsets , 2016, IEEE Transactions on Knowledge and Data Engineering.

[7]  Philip S. Yu,et al.  Mining high utility episodes in complex event sequences , 2013, KDD.

[8]  Charu C. Aggarwal,et al.  Managing and Mining Sensor Data , 2013, Springer US.

[9]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[10]  Dhaval Patel,et al.  Top-K High Utility Episode Mining from a Complex Event Sequence , 2016, COMAD.

[11]  Longbing Cao,et al.  USpan: an efficient algorithm for mining high utility sequential patterns , 2012, KDD.

[12]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[13]  Vikram Goyal,et al.  An Efficient Algorithm for Mining High-Utility Itemsets with Discount Notion , 2015, BDA.

[14]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[15]  Aijun An,et al.  Mining top-k high utility patterns over data streams , 2014, Inf. Sci..

[16]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[17]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[18]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[19]  Srikumar Krishnamoorthy,et al.  Pruning strategies for mining high utility itemsets , 2015, Expert Syst. Appl..

[20]  Suh-Yin Lee,et al.  Mining frequent itemsets over data streams using efficient window sliding techniques , 2009, Expert Syst. Appl..

[21]  Houkuan Huang,et al.  TOPSIL-Miner: an efficient algorithm for mining top-K significant itemsets over data streams , 2010, Knowledge and Information Systems.

[22]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[23]  Vikram Goyal,et al.  UP-Hist Tree: An Efficient Data Structure for Mining High Utility Patterns from Transaction Databases , 2015, IDEAS.

[24]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[25]  Yue-Shi Lee,et al.  An Efficient Algorithm for Maintaining Frequent Closed Itemsets over Data Stream , 2009, IEA/AIE.

[26]  Vincent S. Tseng,et al.  FHM: Faster High-Utility Itemset Mining Using Estimated Utility Co-occurrence Pruning , 2014, ISMIS.

[27]  Longbing Cao,et al.  Efficiently Mining Top-K High Utility Sequential Patterns , 2013, 2013 IEEE 13th International Conference on Data Mining.

[28]  Qiang Yang,et al.  Mining high utility itemsets , 2003, Third IEEE International Conference on Data Mining.

[29]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[30]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[31]  Philip S. Yu,et al.  Efficient algorithms for mining maximal high utility itemsets from data streams with different models , 2012, Expert Syst. Appl..

[32]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[33]  Jian Pei,et al.  CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets , 2000, ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.

[34]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[35]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[36]  Ho-Jin Choi,et al.  Interactive mining of high utility patterns over data streams , 2012, Expert Syst. Appl..