Combining hierarchy encoding and pre-grouping: intelligent grouping in star join processing

Efficient star query processing is crucial for a performant data warehouse (DW) implementation and much work is available on physical optimization (e.g., indexing and schema design) and logical optimization (e.g., pre-aggregated materialized views with query rewriting). One important step in the query processing phase is, however, still a bottleneck: the residual join of results from the fact table with the dimension tables in combination with grouping and aggregation. This phase typically consumes between 50% and 80% of the overall processing time. In typical DW scenarios pre-grouping methods only have a limited effect as the grouping is usually specified on the hierarchy levels of the dimension tables and not on the fact table itself. We suggest a combination of hierarchical clustering and pre-grouping as we have implemented in the relational DBMS Transbase. Exploiting hierarchy semantics for the pre-grouping of fact table result tuples is several times faster than conventional query processing. The reason for this is that hierarchical pre-grouping reduces the number of join operations significantly. With this method even queries covering a large part of the fact table can be executed within a time span acceptable for interactive query processing.

[1]  Timos K. Sellis,et al.  SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes , 2001, DMDW.

[2]  Paul Larson,et al.  Grouping and Duplicate Elimination: Benefits of Early Aggregation , 1997 .

[3]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[4]  Per-Åke Larson,et al.  Data reduction by partial preaggregation , 2002, Proceedings 18th International Conference on Data Engineering.

[5]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[6]  Timos K. Sellis,et al.  SISYPHUS: The implementation of a chunk-based storage manager for OLAP data cubes , 2003, Data Knowl. Eng..

[7]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[8]  Per-Ake Larson,et al.  Performing Group-By before Join , 1994, ICDE 1994.

[9]  C. J. Date,et al.  A Guide to SQL Standard, 3rd Edition , 1993 .

[10]  Alon Y. Halevy,et al.  Query Optimization by Predicate Move-Around , 1994, VLDB.

[11]  Jeffrey F. Naughton,et al.  Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.

[12]  Timos K. Sellis,et al.  Processing Star Queries on Hierarchically-Clustered Fact Tables , 2002, VLDB.

[13]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[14]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[15]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[16]  Ralph Kimball,et al.  The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses , 1996 .