Processing multi-join query in parallel systems

In parallel systems, a number of joins from one or more queries ean be exeeuted either seriatly or in parallel. While seriat execution assigns all processors to execute each join one after another, the parallel execution distributes the joins to clusters formed by certain number of processors and exeeutes them concurrently. Both approaches employ parallelism to improve system performance. However, data skew may result in load imbalance among processors executing the same join and some clusters may be overloaded with time-consuming joins. As a result, the completion time will be much longer than what is expeeted. In this paper, we propose an algorithm to further minimize the completion time of concurrently executed multiple joins. For this algorithm, all the joins to be executed concurrently are decomposed into a set of tasks that are ordered according to decreasing task size. These tasks are dynamically allocated to available processors during exeeution. Our performance study shows that the proposed algorithm outperforms the previously proposed approaches, especially when number of processors increases, high skewness is present in the relations to be joined and relation sizes are large.

[1]  Philip S. Yu,et al.  Effectiveness of Parallel Joins , 1990, IEEE Trans. Knowl. Data Eng..

[2]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[3]  David K. Hsiao,et al.  Advanced Database Machine Architecture , 1983, Advanced Database Machine Architecture.

[4]  Hongjun Lu,et al.  Hash-based join algorithms for multiprocessor computers with shared memory , 1990, VLDB 1990.

[5]  S. Misbah Deen,et al.  Multi-join on parallel processors , 1990, DPDS '90.

[6]  Anupam Bhide,et al.  An Analysis of Three Transaction Processing Architectures , 1988, VLDB.

[7]  Edward Omiecinski,et al.  Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor , 1991, VLDB.

[8]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[9]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[10]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[11]  David Rabinowitz Database computers , 1993 .

[12]  Masaya Nakayama,et al.  Hash-Partitioned Join Method Using Dynamic Destaging Strategy , 1988, VLDB.

[13]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[14]  Yannis E. Ioannidis,et al.  Randomized algorithms for optimizing large join queries , 1990, SIGMOD '90.

[15]  Jim Gray,et al.  A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases , 1990, SIGMETRICS '90.

[16]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[17]  Wendy Hall,et al.  The art of programming , 1987 .

[18]  Keki B. Irani,et al.  The Join Alogorithms on a Shared-Memory Multiprocessor Database Machine , 1988, IEEE Trans. Software Eng..

[19]  Alfred G. Dale,et al.  A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins , 1991, VLDB.

[20]  Hongjun Lu,et al.  Design and evaluation of parallel pipelined join algorithms , 1987, SIGMOD '87.

[21]  Hongjun Lu,et al.  Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems , 1992, EDBT.

[22]  Carlo Zaniolo,et al.  Optimization of Nonrecursive Queries , 1986, VLDB.

[23]  Hongjun Lu,et al.  Optimization of Multi-Way Join Queries for Parallel Execution , 1991, VLDB.

[24]  Philip S. Yu,et al.  An effective algorithm for parallelizing hash joins in the presence of data skew , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.