Transparent Parallelism in Query Execution

textabstractA key assumption underlying query optimization schemes for parallel processing is that their cost models can anticipate the multitude of effects encountered during the execution phase. Unfortunately, this is rarely the case and the optimal processing is only achieved in a few situations. However, enriching cost models with further parameters increases likelihood and extent of estimate errors, thus, does not guarantee better results in general. In this paper we address the question how to de-couple optimization and execution by transparent means of parallelism, i.e. once the optimizer determined the degree of parallelism for a group of operators, the underlying execution engine ensures optimal parallel execution without demanding any static schedule by the optimizer. Based on an analytical framework we model both dataflow and processing environment for parallel query execution. On this ground we develop the notion of Non-look-ahead Optimality which reflects an execution strategy's ability of ad-hoc resource utilization. We prove a tight upper bound for the processing time of such strategies and show that they are insensitive to skew. Finally, we model several different strategies and present an execution strategy that fulfills the desired optimality criteria. The new algorithm outperforms conventional pipeline execution substantially and is resistant against various kinds of skew as our experiments confirm.