Partitioning Pipelines with Communication Costs

In this paper, we consider the problem of scheduling a database query execution graph on a parallel machine. Specifically, we consider the problem of data-partitioning pipelined operators with the objective of minimizing response time. This is a basic problem in scheduling database execution trees. Partitioning promises increased parallelism and memory availability at the price of greater communication overhead. Current partitioning methods [BB90, TWPY92, LCRY93, NSHL93] do not consider these trade-offs. We present a mathematical framework within which these alternatives can be quantified for many interesting practical scenarios. We then present an algorithm whose performance is within a factor of 2 of the optimum possible.

[1]  David J. DeWitt,et al.  Complex query processing in multiprocessor database machines , 1990 .

[2]  Hongjun Lu,et al.  Optimization of Multi-Way Join Queries for Parallel Execution , 1991, VLDB.

[3]  Hamid Pirahesh,et al.  Parallelism in relational data base systems: architectural issues and design approaches , 1990, DPDS '90.

[4]  Richard T. Snodgrass,et al.  Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data : SIGMOD '94, Minneapolis, Minnesota, May 24-27, 1994 , 1994, ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems.

[5]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[6]  Jaideep Srivastava,et al.  Optimizing multi-joint queries in parallel relational databases , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[7]  Wei Hong,et al.  Exploiting inter-operation parallelism in XPRS , 1992, SIGMOD '92.

[8]  Hongjun Lu,et al.  On Resource Scheduling of Multi-Join Queries in Parallel Database Systems , 1993, Inf. Process. Lett..

[9]  Patrick Valduriez,et al.  On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces , 1993, VLDB.

[10]  Prithviraj Banerjee,et al.  An Approximate Algorithm for the Partitionable Independent Task Scheduling Problem , 1990, ICPP.

[11]  Mikal Ziane,et al.  Parallel query processing in DBS3 , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[12]  Philip S. Yu,et al.  On optimal processor allocation to support pipelined hash joins , 1993, SIGMOD Conference.

[13]  Kian-Lee Tan,et al.  Multi-Join Optimization for Symmetric Multiprocessors , 1993, VLDB.

[14]  Philip S. Yu,et al.  On parallel execution of multiple pipelined hash joins , 1994, SIGMOD '94.

[15]  Ronald L. Graham,et al.  Bounds on Multiprocessing Timing Anomalies , 1969, SIAM Journal of Applied Mathematics.

[16]  Rajeev Motwani,et al.  Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism , 1994, VLDB.

[17]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[18]  Krishna R. Pattipati,et al.  Scheduling parallelizable tasks: putting it all on the shelf , 1992, SIGMETRICS '92/PERFORMANCE '92.

[19]  Sumit Ganguly Parallel Evaluation of Deductive Database Queries , 1992 .

[20]  Sushil Jajodia,et al.  Proceedings of the 1993 ACM SIGMOD international conference on Management of data , 1993, SIGMOD 1993.

[21]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[22]  Sumit Ganguly,et al.  Query optimization for parallel execution , 1992, SIGMOD '92.