Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

In this paper, we explore the execution of pipelined hash joins in a multiprocessor-based database system. To improve the query execution, an innovative approach on query execution tree selection is proposed to exploit segmented right-deep trees, which are bushy trees of right-deep subtrees. We first derive an analytical model for the execution of a pipeline segment, and then, in light of the model, develop heuristic schemes to determine the query execution plan based on a segmented right-deep tree so that the query can be efficiently executed. As shown by our simulation, the proposed approach, without incurring additional overhead on plan execution, possesses more flexibility in query plan generation, and leads to query plans of significantly better performance than those achievable by the previous schemes using right-deep trees.

[1]  Hamid Pirahesh,et al.  Parallelism in relational data base systems: architectural issues and design approaches , 1990, DPDS '90.

[2]  A. N. Wilschut,et al.  Parallel execution of multi-join queries , 1991 .

[3]  Hongjun Lu,et al.  Hash-based join algorithms for multiprocessor computers with shared memory , 1990, VLDB 1990.

[4]  Goetz Graefe,et al.  Rule-Based Query Optimization in Extensible Database Systems , 1987 .

[5]  Margaret H. Dunham,et al.  Join processing in relational databases , 1992, CSUR.

[6]  Danièle Gardy,et al.  On the effect of join operations on relation sizes , 1989, TODS.

[7]  Philip S. Yu,et al.  Scheduling and processor allocation for parallel execution of multijoin queries , 1992, [1992] Eighth International Conference on Data Engineering.

[8]  A. N. Wilschut,et al.  Dataflow query execution in a parallel main-memory environment , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[9]  Yannis E. Ioannidis,et al.  Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization , 1991, SIGMOD '91.

[10]  Edward Omiecinski,et al.  Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers , 1989, IEEE Trans. Knowl. Data Eng..

[11]  Michael Stonebraker,et al.  Optimization of parallel query execution plans in XPRS , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[12]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[13]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[14]  David J. DeWitt,et al.  Complex query processing in multiprocessor database machines , 1990 .

[15]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[16]  Raymond A. Lorie,et al.  Exploiting database parallelism in a message-passing multiprocessor , 1991, IBM J. Res. Dev..

[17]  Arun N. Swami,et al.  Optimization of large join queries: combining heuristics and combinatorial techniques , 1989, SIGMOD '89.

[18]  Philip S. Yu,et al.  On Workload Characterization of Relational Database Environments , 1992, IEEE Trans. Software Eng..

[19]  J DeWittDavid,et al.  A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment , 1989 .

[20]  Hongjun Lu,et al.  Design and evaluation of parallel pipelined join algorithms , 1987, SIGMOD '87.

[21]  Philip S. Yu,et al.  An effective algorithm for parallelizing hash joins in the presence of data skew , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[22]  Philip S. Yu,et al.  Determining beneficial semijoins for a join sequence in distributed query processing , 1991, [1991] Proceedings. Seventh International Conference on Data Engineering.

[23]  Chaitanya K. Baru,et al.  Implementing relational database operations in a cube-connected multicomputer system , 1987, 1987 IEEE Third International Conference on Data Engineering.

[24]  Michael Stonebraker,et al.  The Design of XPRS , 1988, VLDB.

[25]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[26]  David J. DeWitt,et al.  Tradeoffs in Processing Complex Join Queries via Hashing in Multiprocessor Database Machines , 1990, VLDB.

[27]  Philip S. Yu,et al.  On optimal processor allocation to support pipelined hash joins , 1993, SIGMOD Conference.