This dissertation describes techniques for speeding up Online Analytical Processing or OLAP queries. OLAP systems allow users to quickly obtain the answers to complex business queries. Quickly answering these queries which aggregate large amounts of data, calls for various specialized techniques. One technique used by OLAP systems to speed up multidimensional data analysis is to precompute aggregates on some subsets of dimensions and their corresponding hierarchies.
We first address the problem of efficiently estimating aggregate sizes. Precomputation of aggregate data improves query response time. However, the decision of what and how much to precompute is a difficult one. It is further complicated by the fact that precomputation in the presence of hierarchies can result in an unintuitively large increase in the amount of storage required by the database. Hence, it is interesting and useful to estimate the storage blowup that will result from a proposed set of precomputations without actually computing them. We propose three strategies to solve this problem, and investigate the accuracy of these algorithms in estimating the blowup for different data distributions and database schemas.
Another intriguing problem that we are faced with is which aggregates to precompute. The more that is precomputed, the faster queries can be answered; however, it is often difficult to determine which are the best aggregates to be precomputed given a fixed amount of space. We study the structure of the precomputation problem and show that under certain broad conditions on the multidimensional data, a simple and fast algorithm, PBS achieves good performance bounds. We present an empirical study of PBS that demonstrates that PBS picks a surprisingly good set of aggregates even when the conditions do not hold.
Queries in real world applications frequently require aggregations over multiple cubes (in a star schema, this corresponds to there being multiple fact tables). Unfortunately, most research into aggregate selection has assumed that queries are over a single cube. We analyze aggregate selection in the context of multicube queries, and propose algorithms that perform significantly better than previously proposed algorithms for multicube workloads, without any deterioration in performance for single cube query workloads.
[1]
Jeffrey D. Ullman,et al.
Index selection for OLAP
,
1997,
Proceedings 13th International Conference on Data Engineering.
[2]
Kenneth A. Ross,et al.
Fast Computation of Sparse Datacubes
,
1997,
VLDB.
[3]
Jeffrey F. Naughton,et al.
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
,
1996,
VLDB.
[4]
Jeffrey F. Naughton,et al.
Caching multidimensional queries using chunks
,
1998,
SIGMOD '98.
[5]
Elena Baralis,et al.
Materialized Views Selection in a Multidimensional Database
,
1997,
VLDB.
[6]
Jeffrey D. Ullman,et al.
Implementing data cubes efficiently
,
1996,
SIGMOD '96.
[7]
Michael Stonebraker,et al.
Efficient organization of large multidimensional arrays
,
1994,
Proceedings of 1994 IEEE 10th International Conference on Data Engineering.
[8]
Jeffrey F. Naughton,et al.
An array-based algorithm for simultaneous multidimensional aggregates
,
1997,
SIGMOD '97.
[9]
Jeffrey F. Naughton,et al.
On the Computation of Multidimensional Aggregates
,
1996,
VLDB.
[10]
Jeffrey D. Ullman.
Efficient Implementation of Data Cubes Via Materialized Views
,
1996,
KDD.