今日推荐

2008 - Knowledge and Information Systems

A survey on algorithms for mining frequent itemsets over data streams

Wilfred Ng James Cheng Yiping Ke Wilfred Ng James Cheng Yiping Ke W. Ng

0 阅读

The increasing prominence of data streams arising in a wide range of advanced applications such as fraud detection and trend learning has led to the study of online mining of frequent itemsets (FIs). Unlike mining static databases, mining data streams poses many new challenges. In addition to the one-scan nature, the unbounded memory requirement and the high data arrival rate of data streams, the combinatorial explosion of itemsets exacerbates the mining task. The high complexity of the FI mining problem hinders the application of the stream mining techniques. We recognize that a critical review of existing techniques is needed in order to design and develop efficient mining algorithms and data structures that are able to match the processing rate of the mining with the high arrival rate of data streams. Within a unifying set of notations and terminologies, we describe in this paper the efforts and main techniques for mining data streams and present a comprehensive survey of a number of the state-of-the-art algorithms on mining frequent itemsets over data streams. We classify the stream-mining techniques into two categories based on the window model that they adopt in order to provide insights into how and why the techniques are useful. Then, we further analyze the algorithms according to whether they are exact or approximate and, for approximate approaches, whether they are false-positive or false-negative. We also discuss various interesting issues, including the merits and limitations in existing research and substantive areas for future research.

2012 - Expert Syst. Appl.

Efficient algorithms for mining maximal high utility itemsets from data streams with different models

Philip S. Yu Vincent S. Tseng Bai-En Shie Bai-En Shie V. Tseng

0 阅读

Data stream mining is an emerging research topic in the data mining field. Finding frequent itemsets is one of the most important tasks in data stream mining with wide applications like online e-business and web click-stream analysis. However, two main problems existed in relevant studies: (1) The utilities (e.g., importance or profits) of items are not considered. Actual utilities of patterns cannot be reflected in frequent itemsets. (2) Existing utility mining methods produce too many patterns and this makes it difficult for the users to filter useful patterns among the huge set of patterns. In view of this, in this paper we propose a novel framework, named GUIDE (Generation of maximal high Utility Itemsets from Data strEams), to find maximal high utility itemsets from data streams with different models, i.e., landmark, sliding window and time fading models. The proposed structure, named MUI-Tree (Maximal high Utility Itemset Tree), maintains essential information for the mining processes and the proposed strategies further facilitates the performance of GUIDE. Main contributions of this paper are as follows: (1) To the best of our knowledge, this is the first work on mining the compact form of high utility patterns from data streams; (2) GUIDE is an effective one-pass framework which meets the requirements of data stream mining; (3) GUIDE generates novel patterns which are not only high utility but also maximal, which provide compact and insightful hidden information in the data streams. Experimental results show that our approach outperforms the state-of-the-art algorithms under various conditions in data stream environments on different models.

论文关键词

genetic algorithm data mining big datum power consumption data structure association rule data stream programmable gate array field programmable gate elliptic curve data mining technique efficient algorithm smart card fpga implementation association rule mining mining algorithm power analysi frequent itemset hyperspectral datum sliding window frequent pattern leaf area apriori algorithm mining association rule leaf area index side channel uncertain datum differentially private leakage power algorithmic approach mining association elliptic curve cryptosystem mining frequent itemset mining curve cryptosystem frequent itemset mining plant leaf power analysis attack differential power analysi item set data mining task data stream mining frequent item analysis attack differential power high utility stream mining mining frequent itemset chlorophyll content maximal frequent mining frequent pattern false negative data mining problem high utility itemset frequent closed frequent itemsets mining utility itemset association mining closed itemset chlorophyll fluorescence itemsets mining transactional datum efficient mining correlation power analysi side channel analysi dpa attack maximal frequent itemset frequent closed itemset mining problem mining maximal frequent mining data stream closed frequent mining maximal itemset mining algorithm simple power analysi mining frequent closed leaf chlorophyll content memory consumption leaf chlorophyll finding frequent closed frequent itemset maximum frequent discovering frequent koblitz curve weighted frequent mining closed estimating leaf vegetative growth cryptographic circuit fast mining airborne spectrographic imager chlorophyll meter compact airborne spectrographic finding frequent itemset mining closed frequent top-k frequent estimation of leaf leakage power analysi discovering frequent itemset transactional data stream parallel frequent weighted frequent itemset prosail model discovery of association approximate frequent mining top-k frequent parallel frequent itemset itemset mining problem probabilistic frequent itemset number of transactions frequent itemsets algorithm find frequent itemset