Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window

Mining frequent itemsets has been widely studied over the last decade. Past research focuses on mining frequent itemsets from static databases. In many of the new applications, data flow through the Internet or sensor networks. It is challenging to extend the mining techniques to such a dynamic environment. The main challenges include a quick response to the continuous request, a compact summary of the data stream, and a mechanism that adapts to the limited resources. In this paper, we develop a novel approach for mining frequent itemsets from data streams based on a time-sensitive sliding window model. Our approach consists of a storage structure that captures all possible frequent itemsets and a table providing approximate counts of the expired data items, whose size can be adjusted by the available storage space. Experiment results show that in our approach both the execution time and the storage space remain small under various parameter settings. In addition, our approach guarantees no false alarm or no false dismissal to the results yielded.

[1]  Rakesh Agarwal,et al.  Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[2]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[3]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[4]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[5]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[6]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[7]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[8]  Rina Panigrahy,et al.  Better streaming algorithms for clustering problems , 2003, STOC '03.

[9]  Sudipto Guha,et al.  Streaming-data algorithms for high-quality clustering , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Ming-Syan Chen,et al.  Sliding-window filtering: an efficient algorithm for incremental mining , 2001, CIKM '01.

[11]  David J. DeWitt,et al.  NiagaraCQ: a scalable continuous query system for Internet databases , 2000, SIGMOD '00.

[12]  Jian Pei,et al.  Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach , 2006, Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06).

[13]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[14]  Hongjun Lu,et al.  False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams , 2004, VLDB.

[15]  Moses Charikar,et al.  Finding frequent items in data streams , 2002, Theor. Comput. Sci..

[16]  Gurmeet Singh Manku,et al.  Approximate counts and quantiles over sliding windows , 2004, PODS.

[17]  Johannes Gehrke,et al.  DEMON: Mining and Monitoring Evolving Data , 2001, IEEE Trans. Knowl. Data Eng..

[18]  Chia-Hui Chang,et al.  Enhancing SWF for Incremental Association Mining by Itemset Maintenance , 2003, PAKDD.

[19]  Ke Wang,et al.  Mining frequent item sets by opportunistic projection , 2002, KDD.

[20]  Lukasz Golab,et al.  Issues in data stream management , 2003, SGMD.

[21]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, PODS '03.

[22]  Won Suk Lee,et al.  Finding recent frequent itemsets adaptively over online data streams , 2003, KDD '03.

[23]  Edith Cohen,et al.  Maintaining time-decaying stream aggregates , 2006, J. Algorithms.

[24]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[25]  Frederick Reiss,et al.  TelegraphCQ: Continuous Dataflow Processing for an Uncertain World , 2003, CIDR.

[26]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[27]  Philip S. Yu,et al.  Mining Frequent Patterns in Data Streams at Multiple Time Granularities , 2002 .

[28]  Richard M. Karp,et al.  A simple algorithm for finding frequent elements in streams and bags , 2003, TODS.

[29]  Jiawei Han,et al.  Maintenance of discovered association rules in large databases: an incremental updating technique , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[30]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[31]  Philip S. Yu,et al.  A Regression-Based Temporal Pattern Mining Scheme for Data Streams , 2003, VLDB.

[32]  Philip S. Yu,et al.  Resource-Aware Mining with Variable Granularities in Data Streams , 2004, SDM.