Firth's logistic regression with rare events: accurate effect estimates and predictions?

Firth's logistic regression has become a standard approach for the analysis of binary outcomes with small samples. Whereas it reduces the bias in maximum likelihood estimates of coefficients, bias towards one-half is introduced in the predicted probabilities. The stronger the imbalance of the outcome, the more severe is the bias in the predicted probabilities. We propose two simple modifications of Firth's logistic regression resulting in unbiased predicted probabilities. The first corrects the predicted probabilities by a post hoc adjustment of the intercept. The other is based on an alternative formulation of Firth's penalization as an iterative data augmentation procedure. Our suggested modification consists in introducing an indicator variable that distinguishes between original and pseudo-observations in the augmented data. In a comprehensive simulation study, these approaches are compared with other attempts to improve predictions based on Firth's penalization and to other published penalization strategies intended for routine use. For instance, we consider a recently suggested compromise between maximum likelihood and Firth's logistic regression. Simulation results are scrutinized with regard to prediction and effect estimation. We find that both our suggested methods do not only give unbiased predicted probabilities but also improve the accuracy conditional on explanatory variables compared with Firth's penalization. While one method results in effect estimates identical to those of Firth's penalization, the other introduces some bias, but this is compensated by a decrease in the mean squared error. Finally, all methods considered are illustrated and compared for a study on arterial closure devices in minimally invasive cardiac surgery. Copyright © 2017 John Wiley & Sons, Ltd.

[1]  Gary King,et al.  Logistic Regression in Rare Events Data , 2001, Political Analysis.

[2]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[3]  G. Andrew,et al.  arm: Data Analysis Using Regression and Multilevel/Hierarchical Models , 2014 .

[4]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[5]  M. Schemper,et al.  A solution to the problem of separation in logistic regression , 2002, Statistics in medicine.

[6]  R. Henderson,et al.  Penalised logistic regression and dynamic prediction for discrete-time recurrent event data , 2015, Lifetime data analysis.

[7]  M. G. Pittau,et al.  A weakly informative default prior distribution for logistic and other regression models , 2008, 0901.4011.

[8]  Patrick Royston,et al.  Multivariable model-building with continuous covariates : 1 . Performance measures and simulation design , 2011 .

[9]  Kamil Fijorek,et al.  Separation-Resistant and Bias-Reduced Logistic Regression: STATISTICA Macro , 2012 .

[10]  Joseph Coveney,et al.  FIRTHLOGIT: Stata module to calculate bias reduction in logistic regression , 2008 .

[11]  Sander Greenland Simpson’s Paradox From Adding Constants in Contingency Tables as an Example of Bayesian Noncollapsibility , 2010 .

[12]  R. Tibshirani,et al.  Improvements on Cross-Validation: The 632+ Bootstrap Method , 1997 .

[13]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[14]  A. Genz,et al.  Computation of Multivariate Normal and t Probabilities , 2009 .

[15]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[16]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[17]  Sander Greenland,et al.  Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions , 2015, Statistics in medicine.

[18]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .