On parallel execution of multiple pipelined hash joins

In this paper we study parallel execution of multiple pipelined hash joins. Specifically, we deal with two issues, processor allocation and the use of hash filters, to improve parallel execution of hash joins. We first present a scheme to transform a bushy execution tree to an allocation tree, where each node denotes a pipeline. Then, processors are allocated to the nodes in the allocation tree based on the concept of synchronous execution time such that inner relations (i.e., hash tables) in a pipeline can be made available approximately the same time. In addition, the approach of hash filtering is investigated to further improve the overall performance. Performance studies are conducted via simulation to demonstrate the importance of processor allocation and to evaluate various schemes using hash filters. Simulation results indicate that processor allocation based on the allocation tree significantly outperforms that based on the original bushy tree, and that the effect of hash filtering becomes prominent as the number of relations in a query increases.

[1]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[2]  Hongjun Lu,et al.  Hash-based join algorithms for multiprocessor computers with shared memory , 1990, VLDB 1990.

[3]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[4]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[5]  Jaideep Srivastava,et al.  Optimizing multi-joint queries in parallel relational databases , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[6]  David J. DeWitt,et al.  Complex query processing in multiprocessor database machines , 1990 .

[7]  Forouzan Golshani,et al.  Proceedings of the Eighth International Conference on Data Engineering , 1992 .

[8]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[9]  Michael Stonebraker,et al.  The Design of XPRS , 1988, VLDB.

[10]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[11]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[12]  Danièle Gardy,et al.  On the effect of join operations on relation sizes , 1989, TODS.

[13]  Nick Roussopoulos,et al.  A Pipeline N-way Join Algorithm Based on the 2-way Semijoin Program , 1991, IEEE Trans. Knowl. Data Eng..

[14]  Hongjun Lu,et al.  Optimization of Multi-Way Join Queries for Parallel Execution , 1991, VLDB.

[15]  Philip S. Yu,et al.  Applying Hash Filters to Improving the Execution of Bushy Trees , 1993, VLDB.

[16]  Dennis McLeod,et al.  Proceedings of the 16th International Conference on Very Large Data Bases , 1990, Very Large Data Bases Conference.

[17]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[18]  Philip S. Yu,et al.  Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins , 1992, VLDB.

[19]  Kien A. Hua,et al.  Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[20]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[21]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[22]  David J. DeWitt,et al.  Proceedings of the 14th International Conference on Very Large Data Bases , 1988, VLDB 1988.

[23]  Philip S. Yu,et al.  On optimal processor allocation to support pipelined hash joins , 1993, SIGMOD Conference.

[24]  Philip S. Yu,et al.  Scheduling Multiple Queries on a Parallel Machine , 1994, SIGMETRICS.

[25]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[26]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[27]  Mikal Ziane,et al.  Parallel query processing in DBS3 , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[28]  Philip S. Yu,et al.  Scheduling and processor allocation for parallel execution of multijoin queries , 1992, [1992] Eighth International Conference on Data Engineering.

[29]  Dina Bitton,et al.  Disk Shadowing , 1988, VLDB.