A modified score function estimator for multinomial logistic regression in small samples

Logistic regression modelling of mixed binary and continuous covariates is common in practice, but conventional estimation methods may not be feasible or appropriate for small samples. It is well known that the usual maximum likelihood estimates (MLEs) of the log-odds-ratio parameters are biased in finite samples, and there is a non-zero probability that an MLE is infinite, i.e., does not exist. In this paper, we extend the approach proposed by Firth (Biometrika 80 (1993) 27) for bias reduction of MLEs in exponential family models to the multinomial logistic regression model, and consider general regression covariate types. The method is based on a suitable modification of the score function that removes first order bias. We apply the method in the analysis of two datasets: one is a study of disease prognosis and the other is a disease prevention trial. In a series of simulation studies in small samples, the modified-score estimates for binomial and trinomial logistic regressions had mean bias closer to zero and smaller mean squared error than other approaches. The modified-score estimates have properties that make them attractive for routine application in logistic regressions of binary and continuous covariates, including the advantage that they can be obtained in samples in which the MLEs are infinite.

[1]  S. Moolgavkar,et al.  A Method for Computing Profile-Likelihood- Based Confidence Intervals , 1988 .

[2]  Dennis E. Jennings Judging Inference Adequacy in Logistic Regression , 1986 .

[3]  W W Hauck,et al.  A comparative study of conditional maximum likelihood estimation of a common odds ratio. , 1984, Biometrics.

[4]  J. Copas Binary Regression Models for Contaminated Data , 1988 .

[5]  S Greenland,et al.  Avoiding power loss associated with categorization and ordinal scores in dose-response and trend analysis. , 1995, Epidemiology.

[6]  W. Hauck,et al.  Wald's Test as Applied to Hypotheses in Logit Analysis , 1977 .

[7]  David R. Cox The analysis of binary data , 1970 .

[8]  Marvin Zelen,et al.  Multinomial response models , 1991 .

[9]  S Greenland,et al.  Small-sample bias and corrections for conditional maximum-likelihood odds-ratio estimators. , 2000, Biostatistics.

[10]  S Greenland,et al.  Problems due to small samples and sparse data in conditional logistic regression analysis. , 2000, American journal of epidemiology.

[11]  A. N. Pettitt,et al.  BIAS CORRECTION FOR CENSORED DATA WITH EXPONENTIAL LIFETIMES , 1998 .

[12]  D. Cox,et al.  A General Definition of Residuals , 1968 .

[13]  Emmanuel Lesaffre,et al.  Partial Separation in Logistic Discrimination , 1989 .

[14]  S. Bull,et al.  Post-transfusion hepatitis: impact of non-A, non-B hepatitis surrogate tests , 1995, The Lancet.

[15]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[16]  R. Schaefer Bias correction in maximum likelihood logistic regression. , 1985, Statistics in medicine.

[17]  Walter W. Hauck,et al.  Finite-Sample Properties of Some Old and Some New Estimators of a Common Odds Ratio from Multiple 2 × 2 Tables , 1982 .

[18]  Purushottam W. Laud,et al.  On Bayesian Analysis of Generalized Linear Models Using Jeffreys's Prior , 1991 .

[19]  E. K. Harris,et al.  Multivariate Interpretation of Clinical Laboratory Data. , 1989 .

[20]  B. Haldane THE ESTIMATION AND SIGNIFICANCE OF THE LOGARITHM OF A RATIO OF FREQUENCIES , 1956, Annals of human genetics.

[21]  V. T. Farewell,et al.  Jackknife estimation with structured data , 1978 .

[22]  W W Hauck,et al.  Jackknife bias reduction for polychotomous logistic regression. , 1997, Statistics in medicine.

[23]  P. McCullagh,et al.  Bias Correction in Generalized Linear Models , 1991 .

[24]  Thomas J. Santner,et al.  A note on A. Albert and J. A. Anderson's conditions for the existence of maximum likelihood estimates in logistic regression models , 1986 .

[25]  D. G. Simpson,et al.  The Statistical Analysis of Discrete Data , 1989 .

[26]  Gerhard Tutz,et al.  Advances in GLIM and Statistical Modelling , 1992 .

[27]  J. A. Anderson,et al.  Logistic Discrimination and Bias Correction in Maximum Likelihood Estimation , 1979 .

[28]  E. K. Harris,et al.  Multivariate Interpretation of Clinical Laboratory Data , 1987 .

[29]  Stanley Wasserman,et al.  Sociological Methodology, 1987 , 1989 .

[30]  J M Alho,et al.  On the computation of likelihood ratio and score test based confidence intervals in generalized linear models. , 1992, Statistics in medicine.

[31]  D. Firth Generalized Linear Models and Jeffreys Priors: An Iterative Weighted Least-Squares Approach , 1992 .

[32]  Karim F. Hirji,et al.  Computing Exact Distributions for Polytomous Response Data , 1992 .

[33]  Nitin R. Patel,et al.  Exact logistic regression: theory and examples. , 1995, Statistics in medicine.

[34]  David Firth,et al.  Bias reduction, the Jeffreys prior and GLIM , 1992 .

[35]  T. Santner,et al.  On the small sample properties of norm-restricted maximum likelihood estimators for logistic regression models , 1989 .

[36]  Walter W. Hauck,et al.  Two-step jackknife bias reduction for logistic regression mles , 1994 .

[37]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[38]  Donald B. Rubin,et al.  Logit-Based Interval Estimation for Binomial Data Using the Jeffreys Prior , 1987 .