Hierarchically compressed wavelet synopses

The wavelet decomposition is a proven tool for constructing concise synopses of large data sets that can be used to obtain fast approximate answers. Existing research studies focus on selecting an optimal set of wavelet coefficients to store so as to minimize some error metric, without however seeking to reduce the size of the wavelet coefficients themselves. In many real data sets the existence of large spikes in the data values results in many large coefficient values lying on paths of a conceptual tree structure known as the error tree. To exploit this fact, we introduce in this paper a novel compression scheme for wavelet synopses, termed hierarchically compressed wavelet synopses, that fully exploits hierarchical relationships among coefficients in order to reduce their storage. Our proposed compression scheme allows for a larger number of coefficients to be stored for a given space constraint thus resulting in increased accuracy of the produced synopsis. We propose optimal, approximate and greedy algorithms for constructing hierarchically compressed wavelet synopses that minimize the sum squared error while not exceeding a given space budget. Extensive experimental results on both synthetic and real-world data sets validate our novel compression scheme and demonstrate the effectiveness of our algorithms against existing synopsis construction algorithms.

[1]  Wei Hong,et al.  Model-Driven Data Acquisition in Sensor Networks , 2004, VLDB.

[2]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[3]  Sudipto Guha,et al.  XWAVE: Approximate Extended Wavelets for Streaming Data , 2004, VLDB.

[4]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[5]  Wim Sweldens,et al.  An Overview of Wavelet Based Multiresolution Analyses , 1994, SIAM Rev..

[6]  Sudipto Guha,et al.  Wavelet synopsis for data streams: minimizing non-euclidean error , 2005, KDD '05.

[7]  Kyuseok Shim,et al.  WALRUS: A Similarity Retrieval Algorithm for Image Databases , 2004, IEEE Trans. Knowl. Data Eng..

[8]  Douglas L. Jones,et al.  A signal-dependent time-frequency representation: fast algorithm for optimal kernel design , 1994, IEEE Trans. Signal Process..

[9]  S. Mallat A wavelet tour of signal processing , 1998 .

[10]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[11]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[12]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[13]  Sudipto Guha,et al.  XWAVE: optimal and approximate extended wavelets , 2004, VLDB 2004.

[14]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[15]  Sudipto Guha,et al.  Space Efficiency in Synopsis Construction Algorithms , 2005, VLDB.

[16]  Nikos Mamoulis,et al.  One-Pass Wavelet Synopses for Maximum-Error Metrics , 2005, VLDB.

[17]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[18]  Dimitris Sacharidis,et al.  SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data , 2005, SIGMOD '05.

[19]  Sudipto Guha,et al.  Approximation Algorithms for Wavelet Transform Coding of Data Streams , 2006, IEEE Transactions on Information Theory.

[20]  Dimitris Sacharidis,et al.  Fast Approximate Wavelet Tracking on Streams , 2006, EDBT.

[21]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[22]  Yossi Matias,et al.  Optimal workload-based weighted wavelet synopses , 2005, Theor. Comput. Sci..

[23]  Jeffrey Scott Vitter,et al.  Dynamic Maintenance of Wavelet-Based Histograms , 2000, VLDB.

[24]  Nick Roussopoulos,et al.  Extended wavelets for multiple measures , 2003, SIGMOD '03.

[25]  Yossi Matias,et al.  Inner-Product Based Wavelet Synopses for Range-Sum Queries , 2006, ESA.

[26]  S. Muthukrishnan,et al.  Subquadratic Algorithms for Workload-Aware Haar Wavelet Synopses , 2005, FSTTCS.

[27]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[28]  Nick Roussopoulos,et al.  A Fast Approximation Scheme for Probabilistic Wavelet Synopses , 2005, SSDBM.

[29]  Amit Kumar,et al.  Deterministic wavelet thresholding for maximum-error metrics , 2004, PODS.

[30]  Amit Kumar,et al.  Wavelet synopses for general error metrics , 2005, TODS.