论文信息 - Implementing data cubes efficiently

Implementing data cubes efficiently

Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor of optimal under a variety of models. We then consider the most common case of the hypercube lattice and examine the choice of materialized views for hypercubes in detail, giving some good tradeoffs between the space used and the average time to answer a query.

[1] Goetz Graefe,et al. Query evaluation techniques for large databases , 1993, CSUR.

[2] Kyuseok Shim,et al. Including Group-By in Query Optimization , 1994, VLDB.

[3] Goetz Graefe,et al. Multi-table joins through bitmapped join indices , 1995, SGMD.

[4] Ashish Gupta,et al. Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[5] Alan Radding. Support decision makers with a data warehouse , 1995 .

[6] Jeffrey F. Naughton,et al. Sampling-Based Estimation of the Number of Distinct Values of an Attribute , 1995, VLDB.

[7] Uriel Feige. A threshold of ln n for approximating set cover (preliminary version) , 1996, STOC '96.

[8] Jeffrey D. Ullman,et al. Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[9] U. Feige. A threshold of ln n for approximating set cover , 1998, JACM.