Skew Handling in the DBS3 Parallel Database System

The gains of parallel query execution can be limited because of high start-up time, interference between execution entities, and poor load balancing. In this paper, we present a solution which reduces these limitations in DBS3, a shared-memory parallel database system. This solution combines static data partitioning and dynamic processor allocation to adapt to the execution context. It makes DBS3 almost insensitive to data skew and allows decoupling the degree of parallelism from the degree of data partitioning. To address the problem of load balancing in the presence of data skew, we analyze three important factors that influence the behavior of our parallel execution model: skew factor, degree of parallelism and degree of partitioning. We report on experiments varying these three parameters with the DBS3 prototype on a 72-node KSR1 multiprocessor. The results demonstrate high performance gains, even with highly skewed data.

[1]  Edward Omiecinski,et al.  Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor , 1991, VLDB.

[2]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[3]  Patrick Valduriez,et al.  Prototyping DBS3, a shared-memory parallel database system , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[4]  Mike Ward,et al.  A Compositional Approach for the Design of a Parallel Query Processing Language , 1992, PARLE.

[5]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[6]  George Kingsley Zipf,et al.  Human Behaviour and the Principle of Least Effort: an Introduction to Human Ecology , 2012 .

[7]  Wei Hong,et al.  Exploiting inter-operation parallelism in XPRS , 1992, SIGMOD '92.

[8]  Abdelkader Hameurlain,et al.  Scheduling and mapping for parallel execution of extended SQL queries , 1995, CIKM '95.

[9]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[10]  Benoît Dageville,et al.  Compiling control into database queries for parallel execution management , 1991, [1991] Proceedings of the First International Conference on Parallel and Distributed Information Systems.

[11]  J. Rothnie,et al.  The KSR 1: bridging the gap between shared memory and MPPs , 1993, Digest of Papers. Compcon Spring.

[12]  David J. DeWitt,et al.  Managing Intra-operator Parallelism in Parallel Database Systems , 1995, VLDB.

[13]  Patrick Valduriez,et al.  Invited Project Review: Industrial-strength parallel query optimization: issues and lessons , 1994, Inf. Syst..

[14]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[15]  David J. DeWitt,et al.  Practical Skew Handling in Parallel Joins , 1992, VLDB.

[16]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[17]  Benoît Dageville,et al.  The Impact of the KSR1 Allcache Architecture on the Behavior of the DBS3 Parallel DBMS , 1994, PARLE.

[18]  Philip S. Yu,et al.  On parallel execution of multiple pipelined hash joins , 1994, SIGMOD '94.

[19]  Tom W. Keller,et al.  Data placement in Bubba , 1988, SIGMOD '88.

[20]  Masaru Kitsuregawa,et al.  Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.

[21]  Alfred G. Dale,et al.  A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins , 1991, VLDB.

[22]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.