Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor

Within the last several years, there has been a growing interest in applying general multiprocessor systems to relational database query processing. Efficient parallel algorithms have been designed for the join operation but usually have a failing in that their performance deteriorates greatly when the data is nonuniform. In this paper, we propose a new version of the hashbased join algorithm that balances the load between the processors, for any given bucket, in a shared everything environment. We develop an analytical model of the cost of the algorithm and implement the algorithm on a shared memory multiprocessor machine. We also perform a number of experiments comparing our model with our empirical results.

[1]  Michael Stonebraker,et al.  The Design of XPRS , 1988, VLDB.

[2]  S. Misbah Deen,et al.  Multi-join on parallel processors , 1990, DPDS '90.

[3]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[4]  Philip S. Yu,et al.  Effect of Skew on Join Performance in Parallel Architectures , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[5]  Hongjun Lu,et al.  Hash-based join algorithms for multiprocessor computers with shared memory , 1990, VLDB 1990.

[6]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[7]  Philip S. Yu,et al.  An effective algorithm for parallelizing sort merge joins in the presence of data skew , 1990, [1990] Proceedings. Second International Symposium on Databases in Parallel and Distributed Systems.

[8]  Doron Rotem,et al.  Effective Resource Utilization for Multiprocessor Join Execution , 1989, VLDB.

[9]  Masaru Kitsuregawa,et al.  Performance evaluation of functional disk system with nonuniform data distribution , 1990, DPDS '90.

[10]  Edward Omiecinski,et al.  Parallel join processing using nonclustered indexes for a shared memory multiprocessor , 1990, Proceedings of the Second IEEE Symposium on Parallel and Distributed Processing 1990.

[11]  Chaitanya K. Baru,et al.  Implementing Relational Database Operations in a Cube-Connected Multicomputer System , 1987, ICDE.

[12]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[13]  Keki B. Irani,et al.  The Join Alogorithms on a Shared-Memory Multiprocessor Database Machine , 1988, IEEE Trans. Software Eng..

[14]  Masaru Kitsuregawa,et al.  Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.

[15]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[16]  Edward Omiecinski,et al.  Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers , 1989, IEEE Trans. Knowl. Data Eng..

[17]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.