Maintaining Wavelet Synopses for Sliding-Window Aggregates

The IoT era has brought forth a computing paradigm shift from traditional high-end servers to "edge" devices of limited processing and memory capabilities. These devices, together with sensors, regularly produce very high data volumes nowadays. For many real-time applications, storing and indexing an unbounded stream may not be an option. Thus, it is important that we design algorithms and systems that can both work at the edge of the network and be able to answer queries on distributed, streaming data. Moreover, in many streaming scenarios, fresh data tend to be prioritized. A sliding-window model is an important case of stream processing, where only the most recent elements remain active and the rest are discarded. In this work, we study the problem of maintaining basic aggregate statistics over a sliding-window data stream under the constraint of limited memory. As in IoT scenarios the available memory is typically much less than the window size, queries are answered from compact synopses that are maintained in an online fashion. For the efficient construction of such synopses, in this work, we propose wavelet-based algorithms that provide deterministic guarantees and produce almost exact results. Our algorithms can work on any kind of numerical data and do not have the positive-numbers constraint of techniques such as the exponential histograms. Our experimental evaluation indicates that, in terms of accuracy and space-efficiency, our solution outperforms the exponential histograms and deterministic waves techniques.

[1]  Yossi Matias,et al.  Workload-Based Wavelet Synopses , 2005 .

[2]  Odysseas Papapetrou,et al.  Sketch-based Querying of Distributed Sliding-Window Data Streams , 2012, Proc. VLDB Endow..

[3]  Sudipto Guha,et al.  Space Efficiency in Synopsis Construction Algorithms , 2005, VLDB.

[4]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[5]  Dimitris Sacharidis,et al.  Exploiting duality in summarization with deterministic guarantees , 2007, KDD '07.

[6]  Nikos Mamoulis,et al.  One-Pass Wavelet Synopses for Maximum-Error Metrics , 2005, VLDB.

[7]  Srikanta Tirthapura,et al.  Distributed Streams Algorithms for Sliding Windows , 2002, SPAA '02.

[8]  Johannes Gehrke,et al.  Query Processing in Sensor Networks , 2003, CIDR.

[9]  Samuel Madden,et al.  Fjording the stream: an architecture for queries over streaming sensor data , 2002, Proceedings 18th International Conference on Data Engineering.

[10]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[11]  Divyakant Agrawal,et al.  Supporting sliding window queries for continuous data streams , 2003, 15th International Conference on Scientific and Statistical Database Management, 2003..

[12]  Achour Mostéfaoui,et al.  Efficiently Summarizing Data Streams over Sliding Windows , 2015, 2015 IEEE 14th International Symposium on Network Computing and Applications.

[13]  Dimitris Sacharidis,et al.  Fast Approximate Wavelet Tracking on Streams , 2006, EDBT.

[14]  Rajeev Rastogi,et al.  Streaming Algorithms for Robust, Real-Time Detection of DDoS Attacks , 2007, 27th International Conference on Distributed Computing Systems (ICDCS '07).

[15]  S. Muthukrishnan,et al.  Subquadratic Algorithms for Workload-Aware Haar Wavelet Synopses , 2005, FSTTCS.

[16]  Zahir Tari,et al.  A Technique for Efficient Query Estimation over Distributed Data Streams , 2017, IEEE Transactions on Parallel and Distributed Systems.

[17]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[18]  Ming-Syan Chen,et al.  Dynamic Wavelet Synopses Management over Sliding Windows in Sensor Networks , 2010, IEEE Transactions on Knowledge and Data Engineering.

[19]  Srikanta Tirthapura,et al.  Sketching asynchronous data streams over sliding windows , 2008, Distributed Computing.

[20]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[21]  Nikos Mamoulis,et al.  The Haar+ Tree: A Refined Synopsis Data Structure , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[22]  Edith Cohen,et al.  Maintaining time-decaying stream aggregates , 2003, J. Algorithms.

[23]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[24]  Sudipto Guha,et al.  Wavelet synopsis for data streams: minimizing non-euclidean error , 2005, KDD '05.

[25]  Piotr Indyk,et al.  Maintaining Stream Statistics over Sliding Windows , 2002, SIAM J. Comput..

[26]  Amit Kumar,et al.  Deterministic wavelet thresholding for maximum-error metrics , 2004, PODS.

[27]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[28]  Dennis Shasha,et al.  StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time , 2002, VLDB.

[29]  Lap-Kei Lee,et al.  Continuous Monitoring of Distributed Data Streams over a Time-Based Sliding Window , 2011, Algorithmica.

[30]  Shenghuo Zhu,et al.  A survey on wavelet applications in data mining , 2002, SKDD.

[31]  Michael Stonebraker,et al.  Monitoring Streams - A New Class of Data Management Applications , 2002, VLDB.

[32]  Yann Busnel,et al.  Efficiently Summarizing Distributed Data Streams over Sliding Windows , 2015 .

[33]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[34]  Srikanta Tirthapura,et al.  A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window , 2007, STACS.

[35]  Anne Rogers,et al.  Hancock: a language for extracting signatures from data streams , 2000, KDD '00.

[36]  S. Guha,et al.  Approximation algorithms for wavelet transform coding of data streams , 2006, SODA 2006.

[37]  Rajeev Motwani,et al.  Maintaining variance and k-medians over data stream windows , 2003, PODS.