The Haar+ Tree: A Refined Synopsis Data Structure

We introduce the Haar+ tree: a refined, wavelet-inspired data structure for synopsis construction. The advantages of this structure are twofold: First, it achieves higher synopsis quality at the task of summarizing data sets with sharp discontinuities than state-of-the-art histogram and Haar wavelet techniques. Second, thanks to its search space delimitation capacity, Haar+ synopsis construction operates in time linear to the size of the data set for any monotonic distributive error metric. Through experimentation, we demonstrate the superiority of Haar+ synopses over histogram and Haar wavelet methods in both construction time and achieved quality for representative error metrics.

[1]  A. Haar Zur Theorie der orthogonalen Funktionensysteme , 1910 .

[2]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[3]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[4]  Sudipto Guha,et al.  Fast, small-space algorithms for approximate histogram maintenance , 2002, STOC '02.

[5]  Nikos Mamoulis,et al.  One-Pass Wavelet Synopses for Maximum-Error Metrics , 2005, VLDB.

[6]  A. I. McLeod DIAGNOSTIC CHECKING OF PERIODIC AUTOREGRESSION MODELS WITH APPLICATION , 1994 .

[7]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[8]  Sudipto Guha,et al.  Histogramming Data Streams with Fast Per-Item Processing , 2002, ICALP.

[9]  Sudipto Guha,et al.  REHIST: Relative Error Histogram Construction Algorithms , 2004, VLDB.

[10]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[11]  Yannis E. Ioannidis,et al.  Universality of Serial Histograms , 1993, VLDB.

[12]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[13]  Sudipto Guha,et al.  Approximation Algorithms for Wavelet Transform Coding of Data Streams , 2006, IEEE Transactions on Information Theory.

[14]  Amit Kumar,et al.  Wavelet synopses for general error metrics , 2005, TODS.

[15]  Yannis E. Ioannidis,et al.  Approximate Query Answering using Histograms , 1999, IEEE Data Eng. Bull..

[16]  Sudipto Guha,et al.  Space Efficiency in Synopsis Construction Algorithms , 2005, VLDB.

[17]  Sridhar Ramaswamy,et al.  Join synopses for approximate query answering , 1999, SIGMOD '99.

[18]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[19]  Dimitris Sacharidis,et al.  Fast Approximate Wavelet Tracking on Streams , 2006, EDBT.

[20]  Sudipto Guha,et al.  Approximation and streaming algorithms for histogram construction problems , 2006, TODS.

[21]  S. Muthukrishnan,et al.  Subquadratic Algorithms for Workload-Aware Haar Wavelet Synopses , 2005, FSTTCS.

[22]  Nick Roussopoulos,et al.  Extended wavelets for multiple measures , 2003, SIGMOD '03.

[23]  Minos N. Garofalakis,et al.  Probabilistic wavelet synopses , 2004, TODS.

[24]  Dimitris Sacharidis,et al.  SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data , 2005, SIGMOD '05.

[25]  Evimaria Terzi,et al.  Efficient Algorithms for Sequence Segmentation , 2006, SDM.

[26]  David Salesin,et al.  Wavelets for computer graphics: theory and applications , 1996 .

[27]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[28]  Yossi Matias,et al.  DIMACS Series in Discrete Mathematicsand Theoretical Computer Science Synopsis Data Structures for Massive Data , 2007 .

[29]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[30]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[31]  Sudipto Guha,et al.  Wavelet synopsis for data streams: minimizing non-euclidean error , 2005, KDD '05.

[32]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[33]  Eamonn J. Keogh,et al.  Locally adaptive dimensionality reduction for indexing large time series databases , 2001, SIGMOD '01.