论文信息 - An Algorithm of Top-k High Utility Itemsets Mining over Data Stream

An Algorithm of Top-k High Utility Itemsets Mining over Data Stream

Existing top-k high utility itemset (HUI) mining algorithms generate candidate itemsets in the mining process; their time & space performance might be severely affected when the dataset is large or contains many long transactions; and when applied to data streams, the performance of corresponding mining algorithm is especially crucial. To address this issue, propose a sliding window based top-k HUIs mining algorithm TOPK-SW; it first stores each batch data of current window as well as the items' utility information to a tree called HUI-Tree, which ensures effective retrieval of utility values without re-scan the dataset, so as to efficiently improve the mining performance. TOPK-SW was tested on 4 classical datasets; results show that TOPK-SW outperforms existing algorithms significantly in both time and space efficiency, especially the time performance improves over 1 order of magnitude.

Yang Liu | Le Wang | Tianjun Lu

[1] Howard J. Hamilton,et al. Mining itemset utilities from transaction databases , 2006, Data Knowl. Eng..

[2] Chin-Chen Chang,et al. Isolated items discarding strategy for discovering high utility itemsets , 2008, Data Knowl. Eng..

[3] Philip S. Yu,et al. UP-Growth: an efficient algorithm for high utility itemset mining , 2010, KDD.

[4] Mengchi Liu,et al. Mining high utility itemsets without candidate generation , 2012, CIKM.

[5] Philip S. Yu,et al. Efficient Algorithms for Mining High Utility Itemsets from Transactional Databases , 2013, IEEE Transactions on Knowledge and Data Engineering.

[6] Fei-yue Ye,et al. New algorithm for mining frequent itemsets in sparse database , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[7] Wen-Yang Lin,et al. Mining High Utility Itemsets Based on the Pre-large Concept , 2013 .

[8] Young-Koo Lee,et al. Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases , 2009, IEEE Transactions on Knowledge and Data Engineering.

[9] Ying Liu,et al. A Two-Phase Algorithm for Fast Discovery of High Utility Itemsets , 2005, PAKDD.

[10] Raj P. Gopalan,et al. CTU-Mine: An Efficient High Utility Itemset Mining Algorithm Using the Pattern Growth Approach , 2007, 7th IEEE International Conference on Computer and Information Technology (CIT 2007).

[11] Suh-Yin Lee,et al. Fast and Memory Efficient Mining of High Utility Itemsets in Data Streams , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[12] Philip S. Yu,et al. Mining top-K high utility itemsets , 2012, KDD.

[13] Benjamin C. M. Fung,et al. Direct Discovery of High Utility Itemsets without Candidate Generation , 2012, 2012 IEEE 12th International Conference on Data Mining.