Estimating bucket accesses: A practical approach

In optimizing database queries one inevitably encounters two important estimation problems. The first problem is to estimate the number of page accesses when selecting k tuples from a relation. The other problem is to estimate the number of different equijoin values remaining after selecting k tuples from a relation. The estimated values strongly depend on how the tuples are distributed over the pages (first problem) and how the equijoin values are distributed over the relation (second problem). It appears to be possible to find restrictive upper and lower limits for these problems in many practical situations. Results derived elsewhere appear to fall significantly outside these limits. Finally, a (time) efficient algorithm to approximate the values to be estimated, is proposed.