Optimal workload-based weighted wavelet synopses

In recent years wavelets were shown to be effective data synopses. We are concerned with the problem of finding efficiently wavelet synopses for massive data sets, in situations where information about query workload is available. We present linear time, I/O optimal algorithms for building optimal workload-based wavelet synopses for point queries. The synopses are based on a novel construction of weighted inner-products and use weighted wavelets that are adapted to those products. The synopses are optimal in the sense that the subset of retained coefficients is the best possible for the bases in use with respect to either the mean-squared absolute or relative errors. For the latter, this is the first optimal wavelet synopsis even for the regular, non-workload-based case. Experimental results demonstrate the advantage obtained by the new optimal wavelet synopses, as well as the robustness of the synopses to deviations in the actual query workload.

[1]  Rajeev Motwani,et al.  Overcoming limitations of sampling for aggregation queries , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Yossi Matias,et al.  Workload-Based Wavelet Synopses , 2005 .

[3]  Amit Kumar,et al.  Deterministic wavelet thresholding for maximum-error metrics , 2004, PODS.

[4]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[5]  Surajit Chaudhuri,et al.  A robust, optimization-based approach for approximate answering of aggregate queries , 2001, SIGMOD '01.

[6]  Surajit Chaudhuri,et al.  Self-tuning histograms: building histograms without looking at data , 1999, SIGMOD '99.

[7]  W. Sweldens,et al.  A new class of unbalanced haar wavelets that form an unconditional basis for Lp on general measure spaces , 1997 .

[8]  Mong-Li Lee,et al.  ICICLES: Self-Tuning Samples for Approximate Query Answering , 2000, VLDB.

[9]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[10]  Jeffrey Scott Vitter,et al.  Approximation and learning techniques in database systems , 1999 .

[11]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[12]  David Salesin,et al.  Wavelets for computer graphics: a primer. 2 , 1995, IEEE Computer Graphics and Applications.

[13]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[14]  Nick Roussopoulos,et al.  Extended wavelets for multiple measures , 2003, SIGMOD '03.

[15]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[16]  R. Coifman,et al.  Two elementary proofs of the ² boundedness of Cauchy integrals on Lipschitz curves , 1989 .

[17]  Yossi Matias,et al.  On the Optimality of the Greedy Heuristic in Wavelet Synopses for Range Queries , 2005 .

[18]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[19]  S. Mallat A wavelet tour of signal processing , 1998 .

[20]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[21]  Bruce G. Lindsay,et al.  Approximate medians and other quantiles in one pass and with limited memory , 1998, SIGMOD '98.