Accelerating queries with group-by and join by groupjoin

Most aggregation queries contain both group-by and join operators, and spend a significant amount of time evaluating these two expensive operators. Merging them into one operator (the groupjoin) significantly speeds up query execution. We introduce two main equivalences to allow for the merging and prove their correctness. Furthermore, we show experimentally that these equivalences can significantly speed up TPC-H.

[1]  Theodore Johnson,et al.  The MD-join: an operator for complex OLAP , 2001, Proceedings 17th International Conference on Data Engineering.

[2]  Andrew Witkowski,et al.  Enhanced Subquery Optimizations in Oracle , 2009, Proc. VLDB Endow..

[3]  Peter M. G. Apers,et al.  Optimization of Nested Queries in a Complex Object Model , 1994, EDBT.

[4]  Norman May,et al.  Strategies for query unnesting in XML databases , 2006, TODS.

[5]  Günter von Bültzingsloewen Optimizing SQL queries for parallel execution , 1989, SGMD.

[6]  David Jordan,et al.  The Object Database Standard: ODMG 2.0 , 1997 .

[7]  Günter von Bültzingsloewen Optimierung von SQL-Anfragen für parallele Bearbeitung , 1990, Grundlagen von Datenbanken.

[8]  Per-Ake Larson,et al.  Performing Group-By before Join , 1994, ICDE 1994.

[9]  Norman May,et al.  Unnesting Scalar SQL Queries in the Presence of Disjunction , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Guido Moerkotte,et al.  Nested Queries in Object Bases , 1993, DBPL.

[11]  Guido Moerkotte,et al.  Classification And Optimization of Nested Queries in Object Bases , 1994, BDA.

[12]  Timos K. Sellis,et al.  The Generalized Pre-Grouping Transformation: Aggregate-Query Optimization in the Presence of Dependencies , 2003, VLDB.

[13]  Anthony C. Klug Access paths in the "Abe" statistical query facility , 1982, SIGMOD '82.

[14]  Guido Moerkotte,et al.  Efficient Evaluation of Aggregates on Bulk Types , 1995, DBPL.

[15]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[16]  Norman May,et al.  Main Memory Implementations for Binary Grouping , 2005, XSym.

[17]  Guido Moerkotte,et al.  Dynamic programming strikes back , 2008, SIGMOD Conference.

[18]  Anthony C. Klug Equivalence of Relational Algebra and Relational Calculus Query Languages Having Aggregate Functions , 1982, JACM.

[19]  David J. DeWitt,et al.  An Evaluation of Non-Equijoin Algorithms , 1991, VLDB.

[20]  Ryohei Nakano Translation with optimization from relational calculus to relational algebra having aggregate functions , 1990, TODS.