A data cube is a popular organization for summary data. A cube is simply a multidimensional structure that contains at each point an aggregate value, i.e., the result of applying an aggregate function to an underlying relation. In practical situations, cubes can require a large amount of storage. The typical approach to reducing storage cost is to materialize parts of the cube on demand. Unfortunately, this lazy evaluation can be a time-consuming operation.
In this paper, we describe an approximation technique that reduces the storage cost of the cube without incurring the run time cost of lazy evaluation. The idea is to provide an incomplete description of the cube and a method of estimating the missing entries with a certain level of accuracy. The description, of course, should take a fraction of the space of the full cube and the estimation procedure should be faster than computing the data from the underlying relations. Since cubes are used to support data analysis and analysts are rarely interested in the precise values of the aggregates (but rather in trends), providing approximate answers is, in most cases, a satisfactory compromise.
Alternatively, the technique can be used to implement a multiresolution system in which a tradeoff is established between the execution time of queries and the errors the user is willing to tolerate. By only going to the disk when it is necessary (to reduce the errors), the query can be executed faster. This idea can be extended to produce a system that incrementally increases the accuracy of the answer while the user is looking at it, supporting on-line aggregation.
[1]
Xintao Wu,et al.
The Role of Approximations in Maintaining and Using Aggregate Views
,
1999,
IEEE Data Eng. Bull..
[2]
Curtis E. Dyreson,et al.
Information Retrieval from an Incomplete Data Cube
,
1996,
VLDB.
[3]
Daniel Barbará.
Chaotic Mining: Knowledge Discovery Using the Fractal Dimension
,
1999,
1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery.
[4]
Xintao Wu,et al.
Using approximations to scale exploratory data analysis in datacubes
,
1999,
KDD '99.
[5]
Helen J. Wang,et al.
Online aggregation
,
1997,
SIGMOD '97.
[6]
Peter J. Haas,et al.
The New Jersey Data Reduction Report
,
1997
.
[7]
Jennifer Widom,et al.
Research problems in data warehousing
,
1995,
CIKM '95.