Moment: maintaining closed frequent itemsets over a stream sliding window
This paper considers the problem of mining closed frequent itemsets over a sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of item-sets over a sliding-window. The selected itemsets consist of a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent item-sets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than previous approaches.
Catch the moment: maintaining closed frequent itemsets over a data stream sliding window
This paper considers the problem of mining closed frequent itemsets over a data stream sliding window using limited memory space. We design a synopsis data structure to monitor transactions in the sliding window so that we can output the current closed frequent itemsets at any time. Due to time and memory constraints, the synopsis data structure cannot monitor all possible itemsets. However, monitoring only frequent itemsets will make it impossible to detect new itemsets when they become frequent. In this paper, we introduce a compact data structure, the closed enumeration tree (CET), to maintain a dynamically selected set of itemsets over a sliding window. The selected itemsets contain a boundary between closed frequent itemsets and the rest of the itemsets. Concept drifts in a data stream are reflected by boundary movements in the CET. In other words, a status change of any itemset (e.g., from non-frequent to frequent) must occur through the boundary. Because the boundary is relatively stable, the cost of mining closed frequent itemsets over a sliding window is dramatically reduced to that of mining transactions that can possibly cause boundary movements in the CET. Our experiments show that our algorithm performs much better than representative algorithms for the sate-of-the-art approaches.
CFI-Stream: mining closed frequent itemsets in data streams
Mining frequent closed itemsets provides complete and condensed information for non-redundant association rules generation. Extensive studies have been done on mining frequent closed itemsets, but they are mainly intended for traditional transaction databases and thus do not take data stream characteristics into consideration. In this paper, we propose a novel approach for mining closed frequent itemsets over data streams. It computes and maintains closed itemsets online and incrementally, and can output the current closed frequent itemsets in real time based on users' specified thresholds. Experimental results show that our proposed method is both time and space efficient, has good scalability as the number of transactions processed increases and adapts very rapidly to the change in data streams.
Incremental updates of closed frequent itemsets over continuous data streams
Online mining of closed frequent itemsets over streaming data is one of the most important issues in mining data streams. In this paper, we propose an efficient one-pass algorithm, NewMoment to maintain the set of closed frequent itemsets in data streams with a transaction-sensitive sliding window. An effective bit-sequence representation of items is used in the proposed algorithm to reduce the time and memory needed to slide the windows. Experiments show that the proposed algorithm not only attain highly accurate mining results, but also run significant faster and consume less memory than existing algorithm Moment for mining closed frequent itemsets over recent data streams.
genetic algorithm data mining big datum power consumption data structure association rule data stream programmable gate array field programmable gate elliptic curve data mining technique efficient algorithm smart card fpga implementation association rule mining mining algorithm power analysi frequent itemset hyperspectral datum sliding window frequent pattern leaf area apriori algorithm mining association rule leaf area index side channel uncertain datum differentially private leakage power algorithmic approach mining association elliptic curve cryptosystem mining frequent itemset mining curve cryptosystem frequent itemset mining plant leaf power analysis attack differential power analysi item set data mining task data stream mining frequent item analysis attack differential power high utility stream mining mining frequent itemset chlorophyll content maximal frequent mining frequent pattern false negative data mining problem high utility itemset frequent closed frequent itemsets mining utility itemset association mining closed itemset chlorophyll fluorescence itemsets mining transactional datum efficient mining correlation power analysi side channel analysi dpa attack maximal frequent itemset frequent closed itemset mining problem mining maximal frequent mining data stream closed frequent mining maximal itemset mining algorithm simple power analysi mining frequent closed leaf chlorophyll content memory consumption leaf chlorophyll finding frequent closed frequent itemset maximum frequent discovering frequent koblitz curve weighted frequent mining closed estimating leaf vegetative growth cryptographic circuit fast mining airborne spectrographic imager chlorophyll meter compact airborne spectrographic finding frequent itemset mining closed frequent top-k frequent estimation of leaf leakage power analysi discovering frequent itemset transactional data stream parallel frequent weighted frequent itemset prosail model discovery of association approximate frequent mining top-k frequent parallel frequent itemset itemset mining problem probabilistic frequent itemset number of transactions frequent itemsets algorithm find frequent itemset