Hoeffding inequalities for join-selectivity estimation and online aggregation

We extend Hoe ding s inequalities for simple averages of random variables to the case of cross product averages We also survey some new and existing Hoe ding inequalities for estimators of the mean variance and standard deviation of a subpopulation These results are applicable to two problems in object relational database management systems xed precision estimation of the selectivity of a join and online processing of aggregation queries For the rst problem the new results can be used to modify the asymptotically e cient sampling based procedures of Haas Naughton Seshadri and Swami so that there is a guaranteed upper bound on the number of sampling steps For the second problem the inequalities can be used to develop conservative con dence intervals for online aggregation such intervals avoid the large intermediate storage requirements and undercoverage problems of intervals based on large sample theory