Aggregates are rife in real life SQL queries. However, in the parallel query processing literature aggregate processing has received surprisingly little attention; furthermore, the way current parallel database systems do aggregate processing is far from optimal in many scenarios. We describe two hashing based algorithms for parallel evaluation of aggregates. A performance analysis via an analytical model and an implementation on the Intel Paragon multi-computer shows that each works well for some aggregation selectivities but poorly for the remaining. Fortunately, where one does poorly the other does well and vice-versa. Thus, the two together cover all possible selectivities. We show how, using sampling, an optimizer can decide which of the two algorithms to use for a particular query. Finally, we investigate the impact of data skew on the performance of these algorithms.
[1]
David J. DeWitt,et al.
Parallel algorithms for the execution of relational database operations
,
1983,
TODS.
[2]
Miron Livny,et al.
Managing Memory to Meet Multiclass Workload Response Time Goals
,
1993,
VLDB.
[3]
J. Bunge,et al.
Estimating the Number of Species: A Review
,
1993
.
[4]
Donovan A. Schneider,et al.
The Gamma Database Machine Project
,
1990,
IEEE Trans. Knowl. Data Eng..
[5]
Goetz Graefe,et al.
Query evaluation techniques for large databases
,
1993,
CSUR.
[6]
Stanley Y. W. Su,et al.
Parallel Algorithms and Their Implementation in MICRONET
,
1982,
VLDB.