Limits of Parallelism in Hash Join Algorithms

Abstract The performance of parallel hash join algorithms is analyzed in an environment where several join queries are running concurrently. Analytical models for predicting the throughput and response time of join queries are developed. We consider two important parallel join algorithms: hybrid hash and Grace join. The effect of skew on the performance of these algorithms is examined. Results based on the analytical models, as well as simulation results, are presented. Some of the results obtained are quite unusual. For instance, in the case of the hybrid hash algorithm, we show that, under heavy load, the response time versus degree of parallelism curve can have two local minima. We establish a simple rule of thumb for choosing the degree of parallelism in order to maximize the throughput of the hybrid hash algorithm. In the case of Grace join, we derive asymptotic conditions on the amount of skew for a limit on parallelism to exist.

[1]  M. Ackroyd,et al.  Skinner's Method for Computing Bounds on Distributions and the Numerical Solution of Continuous-Time Queueing Problems , 1982, IEEE Trans. Commun..

[2]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[3]  Arthur M. Keller,et al.  Adaptive parallel hash join in main-memory databases , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[4]  Leonard Kleinrock,et al.  Queueing Systems - Vol. 1: Theory , 1975 .

[5]  D. Jagerman An inversion technique for the laplace transform with Application to approximation , 1978, The Bell System Technical Journal.

[6]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[7]  Philip S. Yu,et al.  An effective algorithm for parallelizing sort merge joins in the presence of data skew , 1990, DPDS '90.

[8]  Philip S. Yu,et al.  Effectiveness of Parallel Joins , 1990, IEEE Trans. Knowl. Data Eng..

[9]  Asser N. Tantawi,et al.  Performance Analysis of Parallel Processing Systems , 1988, IEEE Trans. Software Eng..

[10]  Asser N. Tantawi,et al.  Asynchronous Disk Interleaving: Approximating Access Delays , 1991, IEEE Trans. Computers.

[11]  Wing Shing Wong,et al.  Performance Analysis of Locking and Optimistic Concurrency Control Algorithms , 1985, Perform. Evaluation.

[12]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[13]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[14]  Philip S. Yu,et al.  An effective algorithm for parallelizing hash joins in the presence of data skew , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[15]  Ronald L. Graham,et al.  Concrete mathematics - a foundation for computer science , 1991 .

[16]  Alan Weiss,et al.  Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[17]  Asser N. Tantawi,et al.  Approximate Analysis of Fork/Join Synchronization in Parallel Queues , 1988, IEEE Trans. Computers.

[18]  Kishor S. Trivedi Probability and Statistics with Reliability, Queuing, and Computer Science Applications , 1984 .