WOLAP : Wavelet-Based Range Aggregate Query Processing

The Discrete Wavelet Transform has emerged as an elegant tool for data analysis queries. It was not until the time we proposed a new wavelet technique, ProPolyne, for fast exact, approximate, or progressive polynomial aggregate query processing that data did not have to be compressed, unlike most of the prior studies in this area. In this paper, after reviewing our ProPolyne technique in more depth with more intuitive and practical discussions, we address its inefficiency in dealing with scientific datasets due to the cube sparseness, subsequently, we propose a new cube model, CFM, to enhance ProPolyne’s both space and query efficiency. While ProPolyne assumed storing the data as large data frequency distribution cubes, CFM organizes the data as a collection of smaller fixed measure cubes to reduce the overall query and storage costs. We combine both cube models in an integrated framework, called WOLAP, for efficient polynomial aggregate query processing. We further enhance WOLAP by proposing practical solutions for real-world deployment in scientific applications. In particular, we show how to incorporate data approximation, how to improve wavelet filter selection, and how to work on datacubes with arbitrary domain sizes.

[1]  Y. Meyer,et al.  Wavelets and Filter Banks , 1991 .

[2]  Terence R. Smith,et al.  Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[3]  S. Muthukrishnan,et al.  Optimal and approximate computation of summary statistics for range aggregates , 2001, PODS '01.

[4]  Dimitris Sacharidis,et al.  Hybrid Query and Data Ordering for Fast and Progressive Range-Aggregate Query Answering , 2005, Int. J. Data Warehous. Min..

[5]  Cyrus Shahabi,et al.  ProDA: a suite of web-services for progressive data analysis , 2005, SIGMOD '05.

[6]  Paul S. Bradley,et al.  Compressed data cubes for OLAP aggregate query approximation on continuous dimensions , 1999, KDD '99.

[7]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[8]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[9]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[10]  Dimitris Sacharidis,et al.  SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data , 2005, SIGMOD '05.

[11]  C. Shahabi,et al.  Wavelet Disk Placement for E � cient Querying of Large Multidimensional Data Sets , 2003 .

[12]  Divyakant Agrawal,et al.  Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes , 2000, CIKM '00.

[13]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[14]  S. Muthukrishnan,et al.  One-Pass Wavelet Decompositions of Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[15]  Divyakant Agrawal,et al.  pCube: Update-efficient online aggregation with progressive feedback and error bounds , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[16]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD '00.

[17]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[18]  Divyakant Agrawal,et al.  Flexible Data Cubes for Online Aggregation , 2001, ICDT.

[19]  S. Muthukrishnan,et al.  Surfing Wavelets on Streams: One-Pass Summaries for Approximate Aggregate Queries , 2001, VLDB.

[20]  Ingrid Daubechies,et al.  Ten Lectures on Wavelets , 1992 .

[21]  Divyakant Agrawal,et al.  The Dynamic Data Cube , 2000, EDBT.

[22]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[23]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.