An adaptive approximation method to discover frequent itemsets over sliding-window-based data streams

Abstract Frequent-pattern discovery in data streams is more challenging than that in traditional databases since several requirements need to be additionally satisfied. For the sliding-window model of data streams, transactions both enter into and leave from the window at each sliding. In this paper, we propose an approximation method for mining frequent itemsets over the sliding window of a data stream. The proposed method could approximate itemsets’ counts from the counts of their subsets instead of scanning the transactions for them. By noticing the more dynamic feature of sliding-window model, we have made an effort to devise a promising technique which enables the proposed method to approximate for itemsets adaptively. In addition, another technique which may adjust and correct the approximations is also designed. Empirical results have shown that the performance of proposed method is quite efficient and stable; moreover, the mining result from adaptive approximation (and approximation adjustment) achieves high accuracy.

[1]  Nan Jiang,et al.  CFI-Stream: mining closed frequent itemsets in data streams , 2006, KDD '06.

[2]  Philip S. Yu,et al.  Moment: maintaining closed frequent itemsets over a stream sliding window , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[3]  Carlo Zaniolo,et al.  Verifying and Mining Frequent Patterns from Large Windows over Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[4]  Noam Nisan,et al.  Approximate Inclusion-Exclusion , 1990, Comb..

[5]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[6]  C. L. Liu,et al.  Introduction to Combinatorial Mathematics. , 1971 .

[7]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[8]  Ferenc Bodon,et al.  A fast APRIORI implementation , 2003, FIMI.

[9]  Won Suk Lee,et al.  A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams , 2004, J. Inf. Sci. Eng..

[10]  Carson Kai-Sang Leung,et al.  DSTree: A Tree Structure for the Mining of Frequent Sets from Data Streams , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Wilfred Ng,et al.  Maintaining frequent closed itemsets over a sliding window , 2008, Journal of Intelligent Information Systems.

[12]  Johannes Gehrke,et al.  Querying and mining data streams: you only get one look a tutorial , 2002, SIGMOD '02.

[13]  Kuen-Fang Jea,et al.  Discovering frequent itemsets over transactional data streams through an efficient and stable approximate approach , 2009, Expert Syst. Appl..