Approximately Processing Multi-granularity Aggregate Queries over Data Streams

Aggregate monitoring over data streams is attracting more and more attention in research community due to its broad potential applications. Existing methods suffer two problems, 1) The aggregate functions which could be monitored are restricted to be first-order statistic or monotonic with respect to the window size. 2) Only a limited number of granularity and time scales could be monitored over a stream, thus some interesting patterns might be neglected, and users might be misled by the incomplete changing profile about current data streams. These two impede the development of online mining techniques over data streams, and some kind of breakthrough is urged. In this paper, we employed the powerful tool of fractal analysis to enable the monitoring of both monotonic and non-monotonic aggregates on time-changing data streams. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for accessing the synopsis from O(m) to O(logm), where m is the number of windows to be monitored. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method are relatively low, while experimental results prove the applicability and efficiency of the proposed algorithm in different application settings.

[1]  Aoying Zhou,et al.  Adaptively Detecting Aggregation Bursts in Data Streams , 2005, DASFAA.

[2]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.

[3]  John C. Hart Fractal image compression and recurrent iterated function systems , 1996, IEEE Computer Graphics and Applications.

[4]  Michael F. Barnsley,et al.  Fractals everywhere , 1988 .

[5]  Hongjun Lu,et al.  False Positive or False Negative: Mining Frequent Itemsets from High Speed Transactional Data Streams , 2004, VLDB.

[6]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[7]  Graham Cormode,et al.  What's new: finding significant differences in network data streams , 2004, IEEE/ACM Transactions on Networking.

[8]  Aoying Zhou,et al.  Dynamically maintaining frequent items over a data stream , 2003, CIKM '03.

[9]  Benoit B. Mandelbrot,et al.  Fractal Geometry of Nature , 1984 .

[10]  M. Crovella,et al.  Heavy-tailed probability distributions in the World Wide Web , 1998 .

[11]  Jennifer Widom,et al.  Resource Sharing in Continuous Sliding-Window Aggregates , 2004, VLDB.

[12]  Rajeev Motwani,et al.  Approximate Frequency Counts over Data Streams , 2012, VLDB.

[13]  Balachander Krishnamurthy,et al.  Sketch-based change detection: methods, evaluation, and applications , 2003, IMC '03.

[14]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[15]  Monson H. Hayes,et al.  Using iterated function systems to model discrete sequences , 1992, IEEE Trans. Signal Process..

[16]  Ambuj K. Singh,et al.  A unified framework for monitoring data streams in real time , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  P.-O. Amblard,et al.  Stochastic discrete scale invariance , 2002, IEEE Signal Processing Letters.

[18]  Dennis Shasha,et al.  Efficient elastic burst detection in data streams , 2003, KDD '03.

[19]  Sudipto Guha,et al.  Data-streams and histograms , 2001, STOC '01.

[20]  Roger Barga,et al.  Proceedings of the 22nd International Conference on Data Engineering Workshops, ICDE 2006, 3-7 April 2006, Atlanta, GA, USA , 2006, ICDE Workshops.