论文信息 - Combining hierarchy encoding and pre-grouping: intelligent grouping in star join processing

Combining hierarchy encoding and pre-grouping: intelligent grouping in star join processing

Efficient star query processing is crucial for a performant data warehouse (DW) implementation and much work is available on physical optimization (e.g., indexing and schema design) and logical optimization (e.g., pre-aggregated materialized views with query rewriting). One important step in the query processing phase is, however, still a bottleneck: the residual join of results from the fact table with the dimension tables in combination with grouping and aggregation. This phase typically consumes between 50% and 80% of the overall processing time. In typical DW scenarios pre-grouping methods only have a limited effect as the grouping is usually specified on the hierarchy levels of the dimension tables and not on the fact table itself. We suggest a combination of hierarchical clustering and pre-grouping as we have implemented in the relational DBMS Transbase. Exploiting hierarchy semantics for the pre-grouping of fact table result tuples is several times faster than conventional query processing. The reason for this is that hierarchical pre-grouping reduces the number of join operations significantly. With this method even queries covering a large part of the fact table can be executed within a time span acceptable for interactive query processing.

[1] Timos K. Sellis,et al. SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes , 2001, DMDW.

[2] Paul Larson,et al. Grouping and Duplicate Elimination: Benefits of Early Aggregation , 1997 .

[3] Surajit Chaudhuri,et al. An overview of data warehousing and OLAP technology , 1997, SGMD.

[4] Per-Åke Larson,et al. Data reduction by partial preaggregation , 2002, Proceedings 18th International Conference on Data Engineering.

[5] Ashish Gupta,et al. Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[6] Timos K. Sellis,et al. SISYPHUS: The implementation of a chunk-based storage manager for OLAP data cubes , 2003, Data Knowl. Eng..

[7] Rudolf Bayer,et al. The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[8] Per-Ake Larson,et al. Performing Group-By before Join , 1994, ICDE 1994.

[9] C. J. Date,et al. A Guide to SQL Standard, 3rd Edition , 1993 .

[10] Alon Y. Halevy,et al. Query Optimization by Predicate Move-Around , 1994, VLDB.

[11] Jeffrey F. Naughton,et al. Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.

[12] Timos K. Sellis,et al. Processing Star Queries on Hierarchically-Clustered Fact Tables , 2002, VLDB.

[13] Volker Markl,et al. Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[14] Per-Åke Larson,et al. Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[15] Kyuseok Shim,et al. Including Group-By in Query Optimization , 1994, VLDB.

[16] Ralph Kimball,et al. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .