Methods of correcting for multiple testing: operating characteristics.

We examine the operating characteristics of 17 methods for correcting p-values for multiple testing on synthetic data with known statistical properties. These methods are derived p-values only and not the raw data. With the test cases, we systematically varied the number of p-values, the proportion of false null hypotheses, the probability that a false null hypothesis would result in a p-value less than 5 per cent and the degree of correlation between p-values. We examined the effect of each of these factors on family-wise and false negative error rates and compared the false negative error rates of methods with an acceptable family-wise error. Only four methods were not bettered in this comparison. Unfortunately, however, a uniformly best method of those examined does not exist. A suggested strategy for examining corrections uses a succession of methods that are increasingly lax in family-wise error. A computer program for these corrections is available.