Analytical Synopses for Approximate Query Answering in OLAP Environments

In this paper we present a technique based on an analytical interpretation of multi-dimensional data and on the well-known Least Squares Approximation (LSA) method for supporting approximate aggregate query answering in OLAP environments, the most common application interfaces for a Data Warehouse Server (DWS). Our technique consists in building data synopses by interpreting the original data distribution as a set of discrete functions. These synopses, called Δ-Syn, are obtained by approximating data with a set of polynomial coefficients, and storing these coefficients instead of the original data. Queries are issued on the compressed representation, thus reducing the number of disk accesses needed to evaluate the answer. We also provide some experimental results on several kinds of synthetic OLAP data cubes.

[1]  Luis Gravano,et al.  STHoles: a multidimensional workload-aware histogram , 2001, SIGMOD '01.

[2]  Torsten Suel,et al.  Optimal Histograms with Quality Guarantees , 1998, VLDB.

[3]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[4]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[5]  M. Powell,et al.  Approximation theory and methods , 1984 .

[6]  Helen J. Wang,et al.  Online aggregation , 1997, SIGMOD '97.

[7]  Yossi Matias,et al.  Fast incremental maintenance of approximate histograms , 1997, TODS.

[8]  Jeffrey Scott Vitter,et al.  Data cube approximation and histograms via wavelets , 1998, CIKM '98.

[9]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[10]  Viswanath Poosala,et al.  Fast approximate answers to aggregate queries on a data cube , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[11]  Yannis E. Ioannidis,et al.  Histogram-Based Approximation of Set-Valued Query-Answers , 1999, VLDB.

[12]  Jeffrey Scott Vitter,et al.  Approximate computation of multidimensional aggregates of sparse data using wavelets , 1999, SIGMOD '99.

[13]  Athanasios Papoulis,et al.  Probability, Random Variables and Stochastic Processes , 1965 .

[14]  Jeffrey Scott Vitter,et al.  Wavelet-based histograms for selectivity estimation , 1998, SIGMOD '98.

[15]  Peter J. Haas,et al.  Improved histograms for selectivity estimation of range predicates , 1996, SIGMOD '96.

[16]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .