Analyzing hospitalization data: potential limitations of Poisson regression.

BACKGROUND Poisson regression is commonly used to analyze hospitalization data when outcomes are expressed as counts (e.g. number of days in hospital). However, data often violate the assumptions on which Poisson regression is based. More appropriate extensions of this model, while available, are rarely used. METHODS We compared hospitalization data between 206 patients treated with hemodialysis (HD) and 107 treated with peritoneal dialysis (PD) using Poisson regression and compared results from standard Poisson regression with those obtained using three other approaches for modeling count data: negative binomial (NB) regression, zero-inflated Poisson (ZIP) regression and zero-inflated negative binomial (ZINB) regression. We examined the appropriateness of each model and compared the results obtained with each approach. RESULTS During a mean 1.9 years of follow-up, 183 of 313 patients (58%) were never hospitalized (indicating an excess of 'zeros'). The data also displayed overdispersion (variance greater than mean), violating another assumption of the Poisson model. Using four criteria, we determined that the NB and ZINB models performed best. According to these two models, patients treated with HD experienced similar hospitalization rates as those receiving PD {NB rate ratio (RR): 1.04 [bootstrapped 95% confidence interval (CI): 0.49-2.20]; ZINB summary RR: 1.21 (bootstrapped 95% CI 0.60-2.46)}. Poisson and ZIP models fit the data poorly and had much larger point estimates than the NB and ZINB models [Poisson RR: 1.93 (bootstrapped 95% CI 0.88-4.23); ZIP summary RR: 1.84 (bootstrapped 95% CI 0.88-3.84)]. CONCLUSIONS We found substantially different results when modeling hospitalization data, depending on the approach used. Our results argue strongly for a sound model selection process and improved reporting around statistical methods used for modeling count data.

[1]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.

[2]  David W. Johnson,et al.  Cost analysis of ongoing care of patients with end-stage renal disease: the impact of dialysis modality and dialysis access. , 2002, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[3]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[4]  P. Austin,et al.  Impact of Modality Choice on Rates of Hospitalization in Patients Eligible for Both Peritoneal Dialysis and Hemodialysis , 2014, Peritoneal Dialysis International.

[5]  R. Wilkinson,et al.  OUTCOME IN PATIENTS ON CONTINUOUS AMBULATORY PERITONEAL DIALYSIS AND HAEMODIALYSIS: 4-YEAR ANALYSIS OF A PROSPECTIVE MULTICENTRE STUDY , 1987, The Lancet.

[6]  Joseph Hilbe,et al.  Negative Binomial Regression: Negative binomial regression , 2011 .

[7]  J. Mullahy Specification and testing of some modified count data models , 1986 .

[8]  Bruce A. Desmarais,et al.  Testing for Zero Inflation in Count Models: Bias Correction for the Vuong Test , 2013 .

[9]  David M. Drukker,et al.  On boundary-value likelihood-ratio tests , 2001 .

[10]  Diane Lambert,et al.  Zero-inflacted Poisson regression, with an application to defects in manufacturing , 1992 .

[11]  J. Guzmán Regression Models for Categorical Dependent Variables Using Stata , 2013 .

[12]  R. Wolfe,et al.  Hospitalization among United States dialysis patients: hemodialysis versus peritoneal dialysis. , 1995, Journal of the American Society of Nephrology : JASN.

[13]  David R. Cox,et al.  Some remarks on overdispersion , 1983 .

[14]  M. Pauly,et al.  Continuous ambulatory peritoneal dialysis: preliminary evidence in the debate over efficacy and cost. , 1983, Health affairs.

[15]  J. Walls,et al.  A selection adjusted comparison of hospitalization on continuous ambulatory peritoneal dialysis and haemodialysis. , 1989, Journal of clinical epidemiology.

[16]  C. Charytan,et al.  A comparative study of continuous ambulatory peritoneal dialysis and center hemodialysis. Efficacy, complications, and outcome in the treatment of end-stage renal disease. , 1986, Archives of internal medicine.

[17]  J. Hilbe Negative Binomial Regression: Preface , 2007 .

[18]  A. Williams,et al.  Continuous ambulatory peritoneal dialysis and haemodialysis in the elderly. , 1990, The Quarterly journal of medicine.

[19]  D. Lamping,et al.  Clinical outcomes and Quality of Life in Elderly Patients on Peritoneal Dialysis versus Hemodialysis , 2002, Peritoneal dialysis international : journal of the International Society for Peritoneal Dialysis.

[20]  J. S. Long,et al.  Regression models for categorical dependent variables using Stata, 2nd Edition , 2005 .

[21]  P. Teschan,et al.  Multicenter study of change in dialysis therapy-maintenance hemodialysis to continuous ambulatory peritoneal dialysis. , 1992, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[22]  Charles E. McCulloch,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models , 2005 .

[23]  David McDowall,et al.  Zero-inflated and overdispersed: what's one to do? , 2013 .

[24]  A. Levin,et al.  Comparative hospitalization of hemodialysis and peritoneal dialysis patients in Canada. , 2000, Kidney international.

[25]  J. Hilbe Negative Binomial Regression: Index , 2011 .

[26]  Q. Vuong Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses , 1989 .

[27]  Jeffrey M Albert,et al.  Estimating overall exposure effects for zero-inflated regression models with application to dental caries , 2014, Statistical methods in medical research.