Estimating the maximum

Estimating the maximum of a sampled dataset is an important and daunting task. We give a sampling algorithm for general datasets which gives estimates strictly better than the largest sample for an infinite family of datasets. Our algorithm overshoots the true maximum of the worst case dataset with probability at most (1/e) + O( 1/k), where k is the size of our sample, which is much smaller than the size of the dataset. Our proof is the result of a new extremal graph coloring theorem: given any red/green coloring of the edges of a complete graph of n vertices, the probability that the edges among k randomly sampled vertices have a certain property is at most (1/e) + O(1/k). In addition, we show that if an algorithm gives an estimate strictly better than the largest sample for some dataset, then the algorithm overshoots the maximum on some other dataset with probability at least (1/e) - O(1/k).