论文信息 - Adaptive parallel hash join in main-memory databases

Adaptive parallel hash join in main-memory databases

Presents an algorithm for parallel hash-join computation on main-memory databases that adapts to data skew, and its implementation on the IBM RP3 multiprocessor. The algorithm exploits the random access capabilities of main memory databases to detect and counteract skew on the fly. Data skew is detected at run time by monitoring the observed frequencies of values of the join attribute and applying to them a threshold function that takes account of the distribution of workload among processors. If and when this threshold is reached for certain values of the join attribute, the computation corresponding to it is fragmented among an appropriate number of processors. Fragmentation requires some replication of input tuples-modestly increasing the total workload, but reduces the completion time significantly by reducing workload at the overloaded processor. A simplified analysis is supplemented by experiments. The description and analysis of the algorithm are based on the shared-nothing model. The implementation uses hierarchical shared memory providing non-uniform memory access.<<ETX>>

Arthur M. Keller | Shaibal Roy | A. M. Keller | Shaibal Roy

[1] Shaibal Roy,et al. Semantic complexity of classes of relational queries and query independent data partitioning , 1991, PODS '91.

[2] Kevin P. McAuliffe,et al. The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[3] Philip S. Yu,et al. Effectiveness of Parallel Joins , 1990, IEEE Trans. Knowl. Data Eng..

[4] Arun N. Swami. A Validated Cost Model for Main Memory Databases , 1989, SIGMETRICS.

[5] Patricia G. Selinger. The Impact of Hardware on Database Systems , 1990, IBM Symposium: Database Systems of the 90s.

[6] Masaru Kitsuregawa,et al. Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC) , 1990, VLDB.

[7] Clifford A. Lynch,et al. Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values , 1988, VLDB.

[8] Philip S. Yu,et al. An effective algorithm for parallelizing sort merge joins in the presence of data skew , 1990, DPDS '90.