A parallel hash-based join algorithm for a networked cluster of multiprocessor nodes

Hash joins are expensive and important operations in relational database systems. Developing parallel hash join algorithms is known as an efficient method to improve their performance. Since a parallel processing environment of a networked cluster of nodes is widely available for its advantages of low-cost, high speed and ease of use, we developed a parallel hash-based join algorithm in a networked cluster of multiprocessor nodes. The parallel hash-based join algorithm has two features. One is that it takes advantage of parallel and distributed environments in which shared-memory multiprocessor computers are nodes of a networked cluster. The other is that a distributed shared virtual space is integrated into the design of the parallel hash-based join algorithm so as to facilitate the algorithm and its implementation. In this paper, we present the ideas of design, describe the parallel hash-based join algorithm, show the performance evaluation of it, as well as give a dynamic changing message model for the presence of skew.