Shrinkage methods enhanced the accuracy of parameter estimation using Cox models with small number of events.

OBJECTIVE When the number of events is small during Cox regression analysis, it is unclear what alternative analytical strategies can be used and when this type of alternative approach is needed. This study explores several analytical strategies in this situation. STUDY DESIGN AND SETTING Simulations and sensitivity analyses were performed on data with numbers of events per predictive variable (EPVs) below 10 using a Cox model with a partial likelihood (PL), Firth's penalized likelihood, or the Bayesian approach. RESULTS For scenarios involving binary predictors with an EPV of six or less, the simulations showed that the Firth and Bayesian approaches were more accurate than was PL. The performances of various approaches were similar when the EPV was greater than six in the binary predictor. Furthermore, the performances involving continuous predictors were similar, regardless of the EPV. The bias and precision of the parameter estimates using Bayesian analysis depended on the selection of priors. CONCLUSIONS When the EPV is six or less, the results for categorical predictors tend to be too conservative. Firth's estimator may be a good alternative in this situation. Appropriate choices of priors when using Bayesian analysis should increase the accuracy of the parameter estimates, although this requires expertise.

[1]  M. Schubauer-Berigan,et al.  A meta-analysis of leukaemia risk from protracted exposure to low-dose gamma radiation , 2010, Occupational and Environmental Medicine.

[2]  A. Albert,et al.  On the existence of maximum likelihood estimates in logistic regression models , 1984 .

[3]  B. Heinmiller,et al.  The 15-Country Collaborative Study of Cancer Risk among Radiation Workers in the Nuclear Industry: Estimates of Radiation-Related Cancer Risks , 2007, Radiation research.

[4]  T. Louis,et al.  Bayes and Empirical Bayes Methods for Data Analysis. , 1997 .

[5]  J. Concato,et al.  Importance of events per independent variable in proportional hazards regression analysis. II. Accuracy and precision of regression estimates. , 1995, Journal of clinical epidemiology.

[6]  M Schemper,et al.  A Solution to the Problem of Monotone Likelihood in Cox Regression , 2001, Biometrics.

[7]  W. Chang,et al.  Cytogenetic effect of chronic low-dose, low-dose-rate γ-radiation in residents of irradiated buildings , 1997, The Lancet.

[8]  Charles E McCulloch,et al.  Relaxing the rule of ten events per variable in logistic and Cox regression. , 2007, American journal of epidemiology.

[9]  Joseph G. Ibrahim,et al.  A Bayesian justification of Cox's partial likelihood , 2003 .

[10]  W. Chang,et al.  Lens Opacities in Young Individuals Long after Exposure to Protracted Low-Dose-Rate γ Radiation in 60Co-Contaminated Buildings in Taiwan , 2010, Radiation research.

[11]  Ralf Bender,et al.  Generating survival times to simulate Cox proportional hazards models , 2005, Statistics in medicine.

[12]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[13]  W. P. Chang,et al.  Chronic low-dose gamma-radiation exposure and the alteration of the distribution of lymphocyte subpopulations in residents of radioactive buildings. , 1999, International journal of radiation biology.

[14]  D. Schoenfeld,et al.  Sample-size formula for the proportional-hazards regression model. , 1983, Biometrics.

[15]  W. P. Chang,et al.  Lenticular Opacities in Populations Exposed to Chronic Low-Dose-Rate Gamma Radiation from Radiocontaminated Buildings in Taiwan , 2001, Radiation research.

[16]  Michael A Babyak,et al.  What You See May Not Be What You Get: A Brief, Nontechnical Introduction to Overfitting in Regression-Type Models , 2004, Psychosomatic medicine.

[17]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[18]  B. Henderson,et al.  Mortality among Radiation Workers at Rocketdyne (Atomics International), 1948–1999 , 2006, Radiation research.

[19]  D. Preston,et al.  Leukemia incidence among people exposed to chronic radiation from the contaminated Techa River, 1953–2005 , 2010, Radiation and environmental biophysics.

[20]  J. Hwang,et al.  Radiation exposure modeling for apartment living spaces with multiple radioactive sources. , 1998, Health physics.

[21]  Thomas Agoritsas,et al.  Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. , 2011, Journal of clinical epidemiology.

[22]  Patrick Royston,et al.  The design of simulation studies in medical statistics , 2006, Statistics in medicine.

[23]  C. C. Chan,et al.  60Co contamination in recycled steel resulting in elevated civilian radiation doses: causes and challenges. , 1997, Health physics.

[24]  W. Chang,et al.  Estimates of Relative Risks for Cancers in a Population after Prolonged Low-Dose-Rate Radiation Exposure: A Follow-up Assessment from 1983 to 2005 , 2008, Radiation research.

[25]  J. Hwang,et al.  Cancer risks in a population with prolonged low dose-rate γ-radiation exposure in radiocontaminated buildings, 1983 – 2002 , 2006, International journal of radiation biology.

[26]  P Peduzzi,et al.  Importance of events per independent variable in proportional hazards analysis. I. Background, goals, and general strategy. , 1995, Journal of clinical epidemiology.