The Effect of Bucket Size Tuning in the Dynamic Hybrid GRACE Hash Join Method

In this paper, we show detailed analysis and performance evaluation of the Dynamic Hybrid GRACE Hash Join Method (DHGH Method) when the tuple distribution in buckets is unbalanced. The conventional Hash Join Methods specify the tuple distribution in buckets statically. However it may differ from estimation since join operations are applied with selection operations. When the tuple distribution in buckets is unbalanced, the processing cost of join operation becomes more costly than the ideal case when you use Hybrid Hash Join Method (HH Method). On the other hand, when you use the DHGH Method, the destaging buckets are selected dynamically, gives the same performance as the ideal case even if the tuple distribution in buckets is unbalanced such as Zipf-like distributions. We analyze the total I/O cost of a join operation at various number of buckets. The result shows that we have to determine the number of buckets baaed on the tuple distribution in buckets rather than the size of the source relation. It is shown that we had better partition the source relation using a large number of small buckets instead of the smaller number of buckets almost filling the whole main memory adopted in the HH Method.