B-term approximation using tree-structured Haar transforms

We present a heuristic solution for B-term approximation using Tree-Structured Haar (TSH) transforms. Our solution consists of two main stages: best basis selection and greedy approximation. In addition, when approximating the same signal with different B constraint or error metric, our solution also provides the flexibility of having less overall running time at expense of more storage space. We adopted lattice structure to index basis vectors, so that one index value can fully specify a basis vector. Based on the concept of fast computation of TSH transform by butterfly network, we also developed an algorithm for directly deriving butterfly parameters and incorporated it into our solution. Results show that, when the error metric is normalized ℓ1-norm and normalized ℓ2-norm, our solution has comparable (sometimes better) approximation quality with prior data synopsis algorithms.

[1]  Jaakko Astola,et al.  Tree-Structured Haar Transforms , 2004, Journal of Mathematical Imaging and Vision.

[2]  Ronald R. Coifman,et al.  Entropy-based algorithms for best basis selection , 1992, IEEE Trans. Inf. Theory.

[3]  Sudipto Guha,et al.  Space Efficiency in Synopsis Construction Algorithms , 2005, VLDB.

[4]  Nikos Mamoulis,et al.  Hierarchical synopses with optimal error guarantees , 2008, TODS.

[5]  Nikos Mamoulis,et al.  One-Pass Wavelet Synopses for Maximum-Error Metrics , 2005, VLDB.

[6]  Sudipto Guha,et al.  On the space–time of optimal, approximate and streaming algorithms for synopsis construction problems , 2008, The VLDB Journal.

[7]  Yannis E. Ioannidis Approximations in Database Systems , 2003, ICDT.

[8]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[9]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[10]  Amit Kumar,et al.  Wavelet synopses for general error metrics , 2005, TODS.

[11]  Sudipto Guha,et al.  REHIST: Relative Error Histogram Construction Algorithms , 2004, VLDB.

[12]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[13]  Sudipto Guha,et al.  Approximation Algorithms for Wavelet Transform Coding of Data Streams , 2006, IEEE Transactions on Information Theory.

[14]  Nikos Mamoulis,et al.  Lattice Histograms: a Resilient Synopsis Structure , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[15]  Yannis E. Ioannidis,et al.  Balancing histogram optimality and practicality for query result size estimation , 1995, SIGMOD '95.

[16]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[17]  Frederick Reiss,et al.  Compact histograms for hierarchical identifiers , 2006, VLDB.

[18]  Charles A. Bouman,et al.  Fast search for best representations in multitree dictionaries , 2006, IEEE Transactions on Image Processing.

[19]  R. DeVore,et al.  Nonlinear approximation , 1998, Acta Numerica.

[20]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[21]  Sudipto Guha,et al.  Wavelet synopsis for data streams: minimizing non-euclidean error , 2005, KDD '05.

[22]  P. Fryzlewicz Unbalanced Haar Technique for Nonparametric Function Estimation , 2007 .