On-line analytical processing (OLAP) requires efficient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the response time of such queries dramatically. A very important design issue of an efficient OLAP system is therefore the choice of the right data cubes to materialize. We call this problem the data cube schema design problem. In this paper we show that the problem of finding an optimal data cube schema for an OLAP system with limited memory is NP-hard. As a more computationally efficient alternative, we propose a greedy approximation algorithm cMP and its variants. Algorithm cMP consists of two phases. In the first phase, an initial schema consisting of all the cubes required to efficiently answer the user queries is formed. In the second phase, cubes in the initial schema are selectively merged to satisfy the memory constraint. We show that cMP is very effective in pruning the search space for an optimal schema. This leads to a highly efficient algorithm. We report the efficiency and the effectiveness of cMP via an empirical study using the TPC-D benchmark. Our results show that the data cube schemas generated by cMP enable very efficient OLAP query processing.
[1]
Ashish Gupta,et al.
Aggregate-Query Processing in Data Warehousing Environments
,
1995,
VLDB.
[2]
Jeffrey F. Naughton,et al.
Materialized View Selection for Multidimensional Datasets
,
1998,
VLDB.
[3]
Jeffrey F. Naughton,et al.
An array-based algorithm for simultaneous multidimensional aggregates
,
1997,
SIGMOD '97.
[4]
Surajit Chaudhuri,et al.
An overview of data warehousing and OLAP technology
,
1997,
SGMD.
[5]
Jeffrey F. Naughton,et al.
On the Computation of Multidimensional Aggregates
,
1996,
VLDB.
[6]
Patrick E. O'Neil,et al.
Improved query performance with variant indexes
,
1997,
SIGMOD '97.
[7]
Kyuseok Shim,et al.
Including Group-By in Query Optimization
,
1994,
VLDB.
[8]
Venky Harinarayan,et al.
Implementing Data Cubes E ciently
,
1996
.
[9]
Jeffrey D. Ullman,et al.
Implementing data cubes efficiently
,
1996,
SIGMOD '96.
[10]
Goetz Graefe,et al.
Multi-table joins through bitmapped join indices
,
1995,
SGMD.