Plot Query Processing with Wavelets

Plots are among the most important and widely used tools for scientific data analysis and visualization. With a plot (a.k.a. range group-by query) data are divided into a number of groups, and at each group, they are summarized over one or more attributes for a given arbitrary range. Wavelets, on the other hand, allow efficient computation of (individual) exact and approximate aggregations. With the current practice, to generate a plot over a wavelet-transformed dataset, one aggregate query is executed per each plot point; hence, for large plots (containing numerous points) a large number of aggregate queries are submitted to the database. On the contrary, we redefine a plot as a range group-by query and propose a wavelet-based technique that exploits I/O sharing across plot points to evaluate the plot efficiently and progressively. The intuition behind our approach comes from the fact that we can decompose a plot query into two sets of 1) aggregate queries, and 2) reconstruction queries. Subsequently, we exploit and extend our earlier related studies to effectively compute both quires in the wavelet domain. We also show that our technique is not only efficient as an exact algorithm but also very effective as an approximation method where either the query time or the storage space is limited.

[1]  Sharad Mehrotra,et al.  Progressive approximate aggregate queries with a multi-resolution tree structure , 2001, SIGMOD '01.

[2]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[3]  Terence R. Smith,et al.  Relative prefix sums: an efficient approach for querying dynamic OLAP data cubes , 1999, Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337).

[4]  S. Muthukrishnan,et al.  Optimal and approximate computation of summary statistics for range aggregates , 2001, PODS '01.

[5]  Minos N. Garofalakis,et al.  Wavelet synopses with error guarantees , 2002, SIGMOD '02.

[6]  Cyrus Shahabi,et al.  ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries , 2002, EDBT.

[7]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[8]  Zhimin Chen,et al.  Efficient computation of multiple group by queries , 2005, SIGMOD '05.

[9]  Cyrus Shahabi,et al.  ProDA: a suite of web-services for progressive data analysis , 2005, SIGMOD '05.

[10]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[11]  Jeffrey F. Naughton,et al.  Simultaneous optimization and evaluation of multiple dimensional queries , 1998, SIGMOD '98.

[12]  Divyakant Agrawal,et al.  pCube: Update-efficient online aggregation with progressive feedback and error bounds , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[13]  雛元 孝夫,et al.  ウェーブレット変換の基礎 = Wavelets made easy , 2000 .

[14]  Divyakant Agrawal,et al.  Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes , 2000, CIKM '00.

[15]  Cyrus Shahabi,et al.  How to evaluate multiple range-sum queries progressively , 2002, PODS '02.

[16]  Dimitris Sacharidis,et al.  Hybrid Query and Data Ordering for Fast and Progressive Range-Aggregate Query Answering , 2005, Int. J. Data Warehous. Min..

[17]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[18]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[19]  Dimitris Sacharidis,et al.  SHIFT-SPLIT: I/O efficient maintenance of wavelet-transformed multidimensional data , 2005, SIGMOD '05.

[20]  Dimitrios Gunopulos,et al.  Approximating multi-dimensional aggregate range queries over real attributes , 2000, SIGMOD 2000.

[21]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[22]  Matthias Jarke,et al.  Advances in Database Technology — EDBT 2002 , 2002, Lecture Notes in Computer Science.