论文信息 - Processing Aggregates in Parallel Database Systems

Processing Aggregates in Parallel Database Systems

Aggregates are rife in real life SQL queries. However, in the parallel query processing literature aggregate processing has received surprisingly little attention; furthermore, the way current parallel database systems do aggregate processing is far from optimal in many scenarios. We describe two hashing based algorithms for parallel evaluation of aggregates. A performance analysis via an analytical model and an implementation on the Intel Paragon multi-computer shows that each works well for some aggregation selectivities but poorly for the remaining. Fortunately, where one does poorly the other does well and vice-versa. Thus, the two together cover all possible selectivities. We show how, using sampling, an optimizer can decide which of the two algorithms to use for a particular query. Finally, we investigate the impact of data skew on the performance of these algorithms.

Ambuj Shatdal | Jeffrey F. Naughton

[1] David J. DeWitt,et al. Parallel algorithms for the execution of relational database operations , 1983, TODS.

[2] Miron Livny,et al. Managing Memory to Meet Multiclass Workload Response Time Goals , 1993, VLDB.

[3] J. Bunge,et al. Estimating the Number of Species: A Review , 1993 .

[4] Donovan A. Schneider,et al. The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[5] Goetz Graefe,et al. Query evaluation techniques for large databases , 1993, CSUR.

[6] Stanley Y. W. Su,et al. Parallel Algorithms and Their Implementation in MICRONET , 1982, VLDB.