A Generalized Join Algorithm

Database query processing traditionally relies on three alternative join algorithms: index nested loops join exploits an index on its inner input, merge join exploits sorted inputs, and hash join exploits differences in the sizes of the join inputs. Cost-based query optimization chooses the most appropriate algorithm for each query and for each operation. Unfortunately, mistaken algorithm choices during compile-time query optimization are common yet expensive to investigate and to resolve. Our goal is to end mistaken choices among join algorithms by replacing the three traditional join algorithms with a single one. Like merge join, this new join algorithm exploits sorted inputs. Like hash join, it exploits different input sizes for unsorted inputs. In fact, for unsorted inputs, the cost functions for recursive hash join and for hybrid hash join have guided our search for the new join algorithm. In consequence, the new join algorithm can replace both merge join and hash join in a database management system. The in-memory components of the new join algorithm employ indexes. If the database contains indexes for one (or both) of the inputs, the new join can exploit persistent indexes instead of temporary in-memory indexes. Using database indexes to match input records, the new join algorithm can also replace index nested loops join. Results from an implementation of the core algorithm are reported.

[1]  Alfons Kemper,et al.  Generalised Hash Teams for Join and Group-by , 1999, VLDB.

[2]  Jennifer Widom,et al.  Content-Based Routing: Different Plans for Different Data , 2005, VLDB.

[3]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[4]  Guangqiang Li On the design and evaluation of a new order-based join algorithm , 2010 .

[5]  Masaya Nakayama,et al.  Hash-Partitioned Join Method Using Dynamic Destaging Strategy , 1988, VLDB.

[6]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[7]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[8]  Joseph M. Hellerstein,et al.  Eddies: continuously adaptive query processing , 2000, SIGMOD '00.

[9]  Goetz Graefe,et al.  Hash Joins and Hash Teams in Microsoft SQL Server , 1998, VLDB.

[10]  Quanzhong Li,et al.  Adaptively Reordering Joins during Query Execution , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  C. Mohan,et al.  Single Table Access Using Multiple Indexes: Optimization, Execution, and Concurrency Control Techniques , 1990, EDBT.

[12]  Hidehiko Tanaka,et al.  An Overview of The System Software of A Parallel Relational Database Machine GRACE , 1986, VLDB.

[13]  Sven Helmer,et al.  Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships , 1998, VLDB.

[14]  Michael J. Carey,et al.  On saying “Enough already!” in SQL , 1997, SIGMOD '97.

[15]  Volker Markl,et al.  LEO: An autonomic query optimizer for DB2 , 2003, IBM Syst. J..

[16]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.