A permutation test for inference in logistic regression with small‐ and moderate‐sized data sets

Inference based on large sample results can be highly inaccurate if applied to logistic regression with small data sets. Furthermore, maximum likelihood estimates for the regression parameters will on occasion not exist, and large sample results will be invalid. Exact conditional logistic regression is an alternative that can be used whether or not maximum likelihood estimates exist, but can be overly conservative. This approach also requires grouping the values of continuous variables corresponding to nuisance parameters, and inference can depend on how this is done. A simple permutation test of the hypothesis that a regression parameter is zero can overcome these limitations. The variable of interest is replaced by the residuals from a linear regression of it on all other independent variables. Logistic regressions are then done for permutations of these residuals, and a p-value is computed by comparing the resulting likelihood ratio statistics to the original observed value. Simulations of binary outcome data with two independent variables that have binary or lognormal distributions yield the following results: (a) in small data sets consisting of 20 observations, type I error is well-controlled by the permutation test, but poorly controlled by the asymptotic likelihood ratio test; (b) in large data sets consisting of 1000 observations, performance of the permutation test appears equivalent to that of the asymptotic test; and (c) in small data sets, the p-value for the permutation test is usually similar to the mid-p-value for exact conditional logistic regression.

[1]  H Brenner,et al.  Controlling for Continuous Confounders in Epidemiologic Research , 1997, Epidemiology.

[2]  T. P. Ryan,et al.  A Preliminary Investigation of Maximum Likelihood Logistic Regression versus Exact Logistic Regression , 2002 .

[3]  Oscar Kempthorne,et al.  In dispraise of the exact test: reactions☆ , 1979 .

[4]  E. Bedrick,et al.  An empirical assessment of saddlepoint approximations for testing a logistic regression parameter. , 1992, Biometrics.

[5]  H. O. Lancaster,et al.  Significance Tests in Discrete Distributions , 1961 .

[6]  Hannu Oja On Permutation Tests in Multiple Regression and Analysis of Covariance Problems , 1987 .

[7]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[8]  Saddlepoint approximations for small sample logistic regression problems. , 2000, Statistics in medicine.

[9]  P. Sen,et al.  Nonparametric methods in multivariate analysis , 1974 .

[10]  K F Hirji,et al.  A quasi-exact test for comparing two binomial proportions. , 1991, Statistics in medicine.

[11]  Celia M. T. Greenwood,et al.  A modified score function estimator for multinomial logistic regression in small samples , 2002 .

[12]  H. Becher,et al.  The concept of residual confounding in regression models and some applications. , 1992, Statistics in medicine.

[13]  Thomas J. Santner,et al.  A note on A. Albert and J. A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models , 1986 .

[14]  Peter E. Kennedy,et al.  Randomization tests for multiple regression , 1996 .

[15]  W. Hoeffding The Large-Sample Power of Tests Based on Permutations of Observations , 1952 .

[16]  G. Barnard Must clinical trials be large? The interpretation of P-values and the combination of test results. , 1990, Statistics in medicine.

[17]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[18]  P. Good,et al.  Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses , 1995 .

[19]  Joseph P. Romano Bootstrap and randomization tests of some nonparametric hypotheses , 1989 .

[20]  Nitin R. Patel,et al.  Exact logistic regression: theory and examples. , 1995, Statistics in medicine.