Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins

The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join for processing. The execution of a query is usually denoted by a query execution tree. To improve the execution of pipelined hash joins, an innovative approach to query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in the light of the model, we develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and can lead to query plans of better performance than those achievable by the previous schemes using right-deep trees. >

[1]  Hongjun Lu,et al.  Design and evaluation of parallel pipelined join algorithms , 1987, SIGMOD '87.

[2]  Ophir Frieder,et al.  Multiprocessor algorithms for relational-database operators on hypercube systems , 1990, Computer.

[3]  Wei Hong,et al.  Exploiting inter-operation parallelism in XPRS , 1992, SIGMOD '92.

[4]  M. Kitsuregawa,et al.  Architecture and performance of relational algebra machine GRACE , 1989 .

[5]  Raymond A. Lorie,et al.  Exploiting database parallelism in a message-passing multiprocessor , 1991, IBM J. Res. Dev..

[6]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[7]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[8]  E IoannidisYannis,et al.  Left-deep vs. bushy trees , 1991 .

[9]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[10]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[11]  Danièle Gardy,et al.  On the effect of join operations on relation sizes , 1989, TODS.

[12]  Nick Roussopoulos,et al.  A Pipeline N-way Join Algorithm Based on the 2-way Semijoin Program , 1991, IEEE Trans. Knowl. Data Eng..

[13]  Philip S. Yu,et al.  Interleaving a Join Sequence with Semijoins in Distributed Query Processing , 1992, IEEE Trans. Parallel Distributed Syst..

[14]  Matthias Jarke,et al.  Query Optimization in Database Systems , 1984, CSUR.

[15]  Alfred G. Dale,et al.  A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins , 1991, VLDB.

[16]  Philip S. Yu,et al.  Parallel Query Processing , 1993, Advanced Database Systems.

[17]  Hongjun Lu,et al.  Optimization of Multi-Way Join Queries for Parallel Execution , 1991, VLDB.

[18]  L WolfJoel,et al.  A Parallel Hash Join Algorithm for Managing Data Skew , 1993 .

[19]  Dennis McLeod,et al.  Proceedings of the 16th International Conference on Very Large Data Bases , 1990, Very Large Data Bases Conference.

[20]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[21]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[22]  Chaitanya K. Baru,et al.  Implementing relational database operations in a cube-connected multicomputer system , 1987, 1987 IEEE Third International Conference on Data Engineering.

[23]  Michael Stonebraker,et al.  The Design of XPRS , 1988, VLDB.

[24]  Philip S. Yu,et al.  On Workload Characterization of Relational Database Environments , 1992, IEEE Trans. Software Eng..

[25]  David J. DeWitt,et al.  Complex query processing in multiprocessor database machines , 1990 .

[26]  Hongjun Lu,et al.  Hash-based join algorithms for multiprocessor computers with shared memory , 1990, VLDB 1990.

[27]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[28]  Keki B. Irani,et al.  The Join Alogorithms on a Shared-Memory Multiprocessor Database Machine , 1988, IEEE Trans. Software Eng..

[29]  Philip S. Yu,et al.  On optimal processor allocation to support pipelined hash joins , 1993, SIGMOD Conference.

[30]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[31]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[32]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[33]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[34]  Philip S. Yu,et al.  A Parallel Hash Join Algorithm for Managing Data Skew , 1993, IEEE Trans. Parallel Distributed Syst..

[35]  Hamid Pirahesh,et al.  Parallelism in relational data base systems: architectural issues and design approaches , 1990, DPDS '90.

[36]  Mikal Ziane,et al.  Parallel query processing in DBS3 , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[37]  Philip S. Yu,et al.  Scheduling and processor allocation for parallel execution of multijoin queries , 1992, [1992] Eighth International Conference on Data Engineering.

[38]  Philip S. Yu,et al.  On parallel execution of multiple pipelined hash joins , 1994, SIGMOD '94.

[39]  Kien A. Hua,et al.  Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[40]  Kien A. Hua,et al.  Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning , 1991, VLDB.

[41]  David J. DeWitt,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989, SIGMOD '89.

[42]  Goetz Graefe,et al.  Rule-Based Query Optimization in Extensible Database Systems , 1987 .

[43]  Edward Omiecinski,et al.  Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers , 1989, IEEE Trans. Knowl. Data Eng..