Hash Joins and Hash Teams in Microsoft SQL Server

The query execution engine in Microsoft SQL Server employs hash-based algorithms for inner and outer joins, semi-joins, set operations (such as intersection), grouping, and duplicate removal. The implementation combines many techniques proposed individually in the research literature but never combined in a single implementation, neither in a product nor in a research prototype. One of the paper’s contributions is a design that cleanly integrates most existing techniques. One technique, however, which we call hash teams and which has previously been described only in vague terms, has not been implemented in prior research or product work. It realizes in hash-based query processing many of the benefits of interesting orderings in sort-based query processing. Moreover, we describe how memory is managed in complex and bushy query evaluation plans with multiple sort and hash operations. Finally, we report on the effectiveness of hashing using two very typical database queries, including the performance effects of hash teams.

[1]  Goetz Graefe,et al.  Query evaluation techniques for large databases , 1993, CSUR.

[2]  Hansjörg Zeller,et al.  An Adaptive Hash Join Algorithm for Multiuser Environments , 1990, VLDB.

[3]  Masaya Nakayama,et al.  The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method , 1989, VLDB.

[4]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[5]  GraefeGoetz Query evaluation techniques for large databases , 1993 .

[6]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[7]  SaccoGiovanni Maria Fragmentation: a technique for efficient query processing , 1986 .

[8]  David J. DeWitt,et al.  Nested loops revisited , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[9]  Giovanni Maria Sacco Fragmentation: a technique for efficient query processing , 1986, TODS.

[10]  Jeffrey F. Naughton,et al.  Query execution techniques for caching expensive methods , 1996, SIGMOD '96.

[11]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[12]  Jim Gray,et al.  The 5 minute rule for trading memory for disc accesses and the 10 byte rule for trading memory for CPU time , 1987, SIGMOD '87.

[13]  Goetz Graefe,et al.  Sort-merge-join: an idea whose time has(h) passed? , 1994, Proceedings of 1994 IEEE 10th International Conference on Data Engineering.

[14]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[15]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[16]  Hidehiko Tanaka,et al.  An Overview of The System Software of A Parallel Relational Database Machine GRACE , 1986, VLDB.