Power and Sample Size Estimation for the Wilcoxon Rank Sum Test with Application to Comparisons of C Statistics from Alternative Prediction Models

The Wilcoxon Mann-Whitney (WMW) U test is commonly used in nonparametric two-group comparisons when the normality of the underlying distribution is questionable. There has been some previous work on estimating power based on this procedure (Lehmann, 1998, Nonparametrics). In this article, we present an approach for estimating type II error, which is applicable to any continuous distribution, and also extend the approach to handle grouped continuous data allowing for ties. We apply these results to obtaining standard errors of the area under the receiver operating characteristic curve (AUROC) for risk-prediction rules under H(1) and for comparing AUROC between competing risk prediction rules applied to the same data set. These results are based on SAS-callable functions to evaluate the bivariate normal integral and are thus easily implemented with standard software.