Effectiveness of Parallel Joins

The effectiveness of parallel processing of relational join operations is examined. The skew in the distribution of join attribute values and the stochastic nature of the task processing times are identified as the major factors that can affect the effective exploitation of parallelism. Expressions for the execution time of parallel hash join and semijoin are derived and their effectiveness analyzed. When many small processors are used in the parallel architecture, the skew can result in some processors becoming sources of bottleneck while other processors are being underutilized. Even in the absence of skew, the variations in the processing times of the parallel tasks belonging to a query can lead to high task synchronization delay and impact the maximum speedup achievable through parallel execution. For example, when the task processing time on each processor is exponential with the same mean, the speedup is proportional to P/ln(P) where P is the number of processors. Other factors such as memory size, communication bandwidth, etc., can lead to even lower speedup. These are quantified using analytical models. >

[1]  Lubomir F. Bic,et al.  Hither Hundreds of Processors in a Database Machine , 1985, IWDM.

[2]  Philip S. Yu,et al.  Tradeoffs Between Coupling Small and Large Processors for Transaction Processing , 1988, IEEE Trans. Computers.

[3]  Patrick Valduriez,et al.  Join and Semijoin Algorithms for a Multiprocessor Database Machine , 1984, TODS.

[4]  Guy M. Lohman,et al.  Index scans using a finite LRU buffer: a validated I/O model , 1989, ACM Trans. Database Syst..

[5]  Leonard Kleinrock,et al.  Theory, Volume 1, Queueing Systems , 1975 .

[6]  Philip S. Yu,et al.  Effect of Skew on Join Performance in Parallel Architectures , 1988, Proceedings [1988] International Symposium on Databases in Parallel and Distributed Systems.

[7]  Patricia G. Selinger,et al.  Access path selection in a relational database management system , 1979, SIGMOD '79.

[8]  Alan Weiss,et al.  Allocating Independent Subtasks on Parallel Processors , 1985, IEEE Transactions on Software Engineering.

[9]  David K. Hsiao,et al.  Advanced Database Machine Architecture , 1983, Advanced Database Machine Architecture.

[10]  Michelle Y. Kim Parallel Operation of Magnetic Disk Storage Devices: Synchronized Disk Interleaving , 1985, IWDM.

[11]  David J. DeWitt,et al.  Benchmarking Database Systems A Systematic Approach , 1983, VLDB.

[12]  David J. DeWitt,et al.  A Single-User Performance Evaluation of the Teradata Database Machine , 1987, HPTS.

[13]  Philip A. Bernstein,et al.  Using Semi-Joins to Solve Relational Queries , 1981, JACM.

[14]  Michael Stonebraker,et al.  Implementation techniques for main memory database systems , 1984, SIGMOD '84.

[15]  David J. DeWitt,et al.  Multiprocessor Hash-Based Join Algorithms , 1985, VLDB.

[16]  Clifford A. Lynch,et al.  Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values , 1988, VLDB.

[17]  Ghassan Z. Qadah The Equi-Join Operation on a Multiprocessor Database Machine: Algorithms and the Evaluation of their Performance , 1985, IWDM.

[18]  A. Gravey A SIMPLE CONSTRUCTION OF AN UPPER BOUND FOR THE MEAN OF THE MAXIMUM OF n IDENTICALLY DISTRIBUTED RANDOM VARIABLES , 1985 .

[19]  Kjell Bratbergsengen,et al.  Hashing Methods and Relational Algebra Operations , 1984, VLDB.

[20]  Philip M. Neches,et al.  The Genesis of a Database Computer , 1984, Computer.

[21]  Daryl J. D'Souza,et al.  The Cost of Relational Algebraic Operations on Skewed Data: Estimates and Experiments , 1983, IFIP Congress.

[22]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[23]  David J. DeWitt,et al.  GAMMA - A High Performance Dataflow Database Machine , 1986, VLDB.

[24]  George Kingsley Zipf,et al.  Human behavior and the principle of least effort , 1949 .

[25]  Paola Velardi,et al.  Performance Modeling of the DBMAC Architecture , 1983, IWDM.

[26]  Stavros Christodoulakis,et al.  Estimating record selectivities , 1983, Inf. Syst..

[27]  David K. Hsiao,et al.  Performance Evaluation of a Database System in Multiple Backend Configurations , 1985, IWDM.

[28]  Jai Menon Sorting and Join Algorithms for Multiprocessor Database Machines , 1986 .

[29]  Leonard D. Shapiro,et al.  Join processing in database systems with large main memories , 1986, TODS.

[30]  Asser N. Tantawi,et al.  Asynchronous Disk Interleaving: Approximating Access Delays , 1991, IEEE Trans. Computers.

[31]  Philip Heidelberger,et al.  A Performance Comparison of Multimicro and Mainframe Database Architectures , 1988, IEEE Trans. Software Eng..