On Estimating the Size and Confidence of a Statistical Audit

We consider the problem of statistical sampling for auditing elections, and we develop a remarkably simple and easily-calculated upper bound for the sample size necessary for determining with probability at least c if a given set of n objects contains fewer than b "bad" objects. While the size of the optimal sample drawn without replacement can be determined with a computer program, our goal is to derive a highly accurate and simple formula that can be used by election officials equipped with only a hand-held calculator. We actually develop several formulae, but the one we recommend for use in practice is: U3(n, b, c) = ⌈(n - (b - 1)/2) ċ (1 - (1 - c)1/b)⌉ = ⌈(n - (b - 1)/2) ċ (1 - exp(ln(1 - c)/b))⌉ As a practical matter, this formula is essentially exact: we prove that it is never too small, and empirical testing for many representative values of n ≤ 10,000, and b ≤ n/2, and c ≤ 0.99 never finds it more than one too large. Theoretically, we show that for all n and b this formula never exceeds the optimal sample size by more than 3 for c ≤ 0.9975, and by more than (-ln(1 - c))/2 for general c.