Sample size determination for logistic regression revisited

There is no consensus on the approach to compute the power and sample size with logistic regression. Some authors use the likelihood ratio test; some use the test on proportions; some suggest various approximations to handle the multivariate case. We advocate the use of the Wald test since the Z-score is routinely used for statistical significance testing of regression coefficients. The null-variance formula became popular from early studies, which contradicts modern software, which utilizes the method of maximum likelihood estimation (MLE), when the variance of the MLE is estimated at the MLE, not at the null. We derive general Wald-based power and sample size formulas for logistic regression and then apply them to binary exposure and confounder to obtain a closed-form expression. These formulas are applied to minimize the total sample size in a case-control study to achieve a given power by optimizing the ratio of controls to cases. Approximately, the optimal number of controls to cases is equal to the square root of the alternative odds ratio. Our sample size and power calculations can be carried out online at www.dartmouth.edu/ approximately eugened.

[1]  N E Day,et al.  The design of case-control studies: the influence of confounding and interaction effects. , 1984, International journal of epidemiology.

[2]  T. P. Ryan,et al.  A Preliminary Investigation of Maximum Likelihood Logistic Regression versus Exact Logistic Regression , 2002 .

[3]  R. Lehr,et al.  Sixteen S-squared over D-squared: a relation for crude sample size estimates. , 1992, Statistics in medicine.

[4]  Power evaluation of small drug and vaccine experiments with binary outcomes. , 1998, Statistics in medicine.

[5]  J H Lubin,et al.  On power and sample size for studying features of the relative odds of disease. , 1990, American journal of epidemiology.

[6]  Alice S. Whittemore,et al.  Sample Size for Logistic Regression with Small Response Probability , 1981 .

[7]  H. Sahai,et al.  Formulae and tables for the determination of sample sizes and power in clinical trials for testing differences in proportions for the two-sample design: a review. , 1996, Statistics in medicine.

[8]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[9]  M. Væth,et al.  On the use of Wald's test in exponential families , 1985 .

[10]  D. Stoyan Stereology and stochastic geometry , 1990 .

[11]  Susan R. Wilson,et al.  Calculating Sample Sizes in the Presence of Confounding Variables , 1986 .

[12]  S B Bull Sample size and power determination for a binary outcome and an ordinal exposure when logistic regression analysis is planned. , 1993, American journal of epidemiology.

[13]  Steven G. Self,et al.  Power/Sample Size Calculations for Generalized Linear Models , 1988 .

[14]  D. Bloch,et al.  A simple method of sample size calculation for linear and logistic regression. , 1998, Statistics in medicine.

[15]  R. Pyke,et al.  Logistic disease incidence models and case-control studies , 1979 .

[16]  T H Beaty,et al.  Minimum sample size estimation to detect gene-environment interaction in case-control designs. , 1994, American journal of epidemiology.

[17]  G. Shieh,et al.  On Power and Sample Size Calculations for Likelihood Ratio Tests in Generalized Linear Models , 2000, Biometrics.

[18]  W. Hauck,et al.  Wald's Test as Applied to Hypotheses in Logit Analysis , 1977 .