The Generalized Pre-Grouping Transformation: Aggregate-Query Optimization in the Presence of Dependencies

One of the recently proposed techniques for the efficient evaluation of OLAP aggregate queries is the usage of clustering access methods. These methods store the fact table of a data warehouse clustered according to the dimension hierarchies using special attributes called hierarchical surrogate keys. In the presence of these access methods new processing and optimization techniques have been recently proposed. One important such optimization technique, called Hierarchical Pre-Grouping, uses the hierarchical surrogate keys in order to aggregate the fact table tuples as early as possible and to avoid redundant joins. In this paper, we study the Pre-Grouping transformation, attempting to generalize its applicability and identify its relationship to other similar transformations. Our results include a general algebraic definition of the Pre-Grouping transformation along with the formal definition of sufficient conditions for applying the transformation. Using a provided theorem we show that Pre-Grouping can be applied in the presence of functional and inclusion dependencies without the explicit usage of hierarchical surrogate keys. An additional result of our study is the definition of the Surrogate-Join transformation that can modify a join condition using a number of dependencies. To our knowledge, Surrogate-Join does not belong to any of the Semantic Query Transformation types discussed in the past.

[1]  Won Kim,et al.  On optimizing an SQL-like nested query , 1982, TODS.

[2]  Timos K. Sellis,et al.  Cost-based optimization of aggregation star queries on hierarchically clustered data warehouses , 2002, DMDW.

[3]  Timos K. Sellis,et al.  Combining hierarchy encoding and pre-grouping: intelligent grouping in star join processing , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  Hamid Pirahesh,et al.  The Magic of Duplicates and Aggregates , 1990, VLDB.

[5]  Per-Ake Larson,et al.  Performing Group-By before Join , 1994, ICDE 1994.

[6]  Surajit Chaudhuri,et al.  Optimization of real conjunctive queries , 1993, PODS '93.

[7]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[8]  Randy H. Katz,et al.  An extended relational algebra with control over duplicate elimination , 1982, PODS.

[9]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[10]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[11]  Qi Cheng,et al.  Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database , 1999, VLDB.

[12]  Dimitri Theodoratos,et al.  Heuristic optimization of OLAP queries in multidimensionally hierarchically clustered databases , 2001, DOLAP '01.

[13]  Umeshwar Dayal,et al.  Of Nests and Trees: A Unified Approach to Processing Queries That Contain Nested Subqueries, Aggregates, and Quantifiers , 1987, VLDB.

[14]  Timos K. Sellis,et al.  SISYPHUS: A Chunk-Based Storage Manager for OLAP Cubes , 2001, DMDW.

[15]  Clement T. Yu,et al.  Semantic Query Optimization for Tree and Chain Queries , 1994, IEEE Trans. Knowl. Data Eng..

[16]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[17]  Timos K. Sellis,et al.  Processing Star Queries on Hierarchically-Clustered Fact Tables , 2002, VLDB.

[18]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[19]  Raghu Ramakrishnan,et al.  Containment of conjunctive queries: beyond relations as sets , 1995, TODS.

[20]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[21]  Jonathan J. King QUIST: A System for Semantic Query Optimization in Relational Databases , 1981, VLDB.

[22]  Z. Meral Özsoyoglu,et al.  A system for semantic query optimization , 1987, SIGMOD '87.

[23]  Joseph Albert,et al.  Algebraic Properties of Bag Data Types , 1991, VLDB.

[24]  Giuseppe Pelagatti,et al.  Formal semantics of SQL queries , 1991, TODS.

[25]  John Grant,et al.  Logic-based approach to semantic query optimization , 1990, TODS.