Effect of Skew on Join Performance in Parallel Architectures

Skew in the distribution of values taken by an attribute is identified as a major factor that can affect the performance of parallel architectures for relational joins. The effect of skew on the performance of two parallel architectures is evaluated using analytic models. In one architecture, called database machine (DBMC), data as well as processing power are distributed; while in the other architecture, called Single Processor Parallel Input/output (SPPI), data is distributed but the processing power is concentrated in one processor. The two architectures are compared in terms of the ratio of MIPS used by DBMC and SPPI to deliver the same throughput and response time. In addition, the horizontal growth potential of DBMC is evaluated in terms of maximum speedup achievable by DBMC relative to SPPI response time. The MIPS ratio as well as speedup are found to be very sensitive to the amount of skew. These suggest, careful thought should be given in parallelizing database applications and in the design of algorithms and query optimizer for parallel architectures.

[1]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[2]  Stavros Christodoulakis,et al.  Estimating record selectivities , 1983, Inf. Syst..

[3]  K. Mani Chandy,et al.  Parametric Analysis of Queuing Networks , 1975, IBM J. Res. Dev..

[4]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[5]  Philip Heidelberger,et al.  A Performance Comparison of Multimicro and Mainframe Database Architectures , 1988, IEEE Trans. Software Eng..

[6]  Jai Menon Sorting and Join Algorithms for Multiprocessor Database Machines , 1986 .

[7]  David J. DeWitt,et al.  A Single-User Performance Evaluation of the Teradata Database Machine , 1987, HPTS.

[8]  Lubomir F. Bic,et al.  Hither Hundreds of Processors in a Database Machine , 1985, IWDM.

[9]  K. Mani Chandy,et al.  Approximate Analysis of General Queuing Networks , 1975, IBM J. Res. Dev..

[10]  Paola Velardi,et al.  Performance Modeling of the DBMAC Architecture , 1987, Database Machine Performance.

[11]  David K. Hsiao,et al.  Performance Evaluation of a Database System in Multiple Backend Configurations , 1985, IWDM.

[12]  Ghassan Z. Qadah The Equi-Join Operation on a Multiprocessor Database Machine: Algorithms and the Evaluation of their Performance , 1985, IWDM.

[13]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[14]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[15]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[16]  Philip S. Yu,et al.  Limiting factors of join performance on parallel processors , 1989, [1989] Proceedings. Fifth International Conference on Data Engineering.

[17]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.