Least expected cost query optimization: what can we expect?

A standard assumption in the database query optimization literature is that it suffices to optimize for the "typical" case---that is, the case in which various parameters (e.g., the amount of available memory, the selectivities of predicates, etc.) take on their "typical" values. It was claimed in [CHS99] that we could do better by choosing plans based on their expected cost. Here we investigate this issue more thoroughly. We show that in many circumstances of interest, a "typical" value of the parameter often does give acceptable answers, provided that it is chosen carefully and we are interested only in minimizing expected running time. However, by minimizing the expected running time, we are effectively assuming that if plan p1 runs three times as long as plan p2, then p1 is exactly three times as bad as p2. An assumption like this is not always appropriate. We show that focusing on least expected cost can lead to significant improvement for a number of cost functions of interest.