Mining frequent itemsets over uncertain data streams

In recent years, due to the wide applications of sensor network monitoring, RFID, moving object search and LBS, mining frequent itemsets over uncertain data streams has attracted much attention. However, existing hyper-structure-based algorithms cannot achieve high mining accuracy. In this paper, we present two sliding-window-based false-positive-oriented algorithms, called uncertain data stream frequent itemsets mining (UFIM) and UFIMTopK, to find threshold-based and rank-based frequent itemsets from uncertain data streams efficiently. UFIM uses a global GT-tree to maintain frequent itemsets in the sliding window and outputs them when needed. In addition, efficient deleting strategy is designed to reduce time overhead. UFIMTopK is designed to find top-k frequent itemsets, and it is modified from UFIM. Experimental results show that our proposed algorithm UFIM can obtain higher mining accuracy than previous algorithms on synthetic and real-life datasets.