Single-pass algorithms for mining frequency change patterns with limited space in evolving append-only and dynamic transaction data streams

We propose an online single-pass algorithm MFC-append (mining frequency change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called Change-Sketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modified approach MFC-dynamic (mining frequency change patterns in dynamic data streams) is also presented to mine frequency changes in dynamic data streams. The theoretic analyses show that our algorithms meet the major performance requirements of single-pass, bounded storage, and real time for streaming data mining.

[1]  Christos Faloutsos,et al.  Online data mining for co-evolving time sequences , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[2]  Yixin Chen,et al.  Multi-Dimensional Regression Analysis of Time-Series Data Streams , 2002, VLDB.

[3]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[4]  Jennifer Widom,et al.  Continuous queries over data streams , 2001, SGMD.

[5]  Sudipto Guha,et al.  Clustering Data Streams , 2000, FOCS.

[6]  Christopher Olston,et al.  Distributed top-k monitoring , 2003, SIGMOD '03.

[7]  Wynne Hsu,et al.  Mining Changes for Real-Life Applications , 2000, DaWaK.

[8]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[9]  Mahesh Viswanathan,et al.  An Approximate L1-Difference Algorithm for Massive Data Streams , 2002, SIAM J. Comput..

[10]  Noga Alon,et al.  The Space Complexity of Approximating the Frequency Moments , 1999 .

[11]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[12]  Johannes Gehrke,et al.  Mining data streams under block evolution , 2002, SKDD.

[13]  Erik D. Demaine,et al.  Frequency Estimation of Internet Packet Streams with Limited Space , 2002, ESA.

[14]  Moses Charikar,et al.  Finding frequent items in data streams , 2004, Theor. Comput. Sci..

[15]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[16]  Johannes Gehrke,et al.  A framework for measuring changes in data characteristics , 1999, PODS '99.

[17]  Prabhakar Raghavan,et al.  Computing on data streams , 1999, External Memory Algorithms.

[18]  Jessica H. Fong,et al.  An Approximate Lp Difference Algorithm for Massive Data Streams , 1999, Discret. Math. Theor. Comput. Sci..

[19]  Rajeev Motwani,et al.  Computing Iceberg Queries Efficiently , 1998, VLDB.

[20]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[21]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[22]  Graham Cormode,et al.  What's hot and what's not: tracking most frequent items dynamically , 2003, TODS.

[23]  Noga Alon,et al.  Tracking join and self-join sizes in limited storage , 1999, PODS '99.

[24]  Philip S. Yu,et al.  Online Mining of Changes from Data Streams: Research Problems and Preliminary Results , 2003 .

[25]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[26]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..