An Algorithm of Top-k High Utility Itemsets Mining over Data Stream

Existing top-k high utility itemset (HUI) mining algorithms generate candidate itemsets in the mining process; their time & space performance might be severely affected when the dataset is large or contains many long transactions; and when applied to data streams, the performance of corresponding mining algorithm is especially crucial. To address this issue, propose a sliding window based top-k HUIs mining algorithm TOPK-SW; it first stores each batch data of current window as well as the items' utility information to a tree called HUI-Tree, which ensures effective retrieval of utility values without re-scan the dataset, so as to efficiently improve the mining performance. TOPK-SW was tested on 4 classical datasets; results show that TOPK-SW outperforms existing algorithms significantly in both time and space efficiency, especially the time performance improves over 1 order of magnitude.

[1]  Howard J. Hamilton,et al.  Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[2]  Chin-Chen Chang,et al.  Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[3]  Philip S. Yu,et al.  UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[4]  Mengchi Liu,et al.  Mining high utility itemsets without candidate generation , 2012, CIKM.

[5]  Philip S. Yu,et al.  Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6]  Fei-yue Ye,et al.  New algorithm for mining frequent itemsets in sparse database , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[7]  Wen-Yang Lin,et al.  Mining High Utility Itemsets Based on the Pre-large Concept , 2013 .

[8]  Young-Koo Lee,et al.  Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[9]  Ying Liu,et al.  A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[10]  Raj P. Gopalan,et al.  CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[11]  Suh-Yin Lee,et al.  Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12]  Philip S. Yu,et al.  Mining top-K high utility itemsets , 2012, KDD.

[13]  Benjamin C. M. Fung,et al.  Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.