Parameter Estimation and Goodness‐of‐Fit in Log Binomial Regression

An estimate of the risk, adjusted for confounders, can be obtained from a fitted logistic regression model, but it substantially over-estimates when the outcome is not rare. The log binomial model, binomial errors and log link, is increasingly being used for this purpose. However this model's performance, goodness of fit tests and case-wise diagnostics have not been studied. Extensive simulations are used to compare the performance of the log binomial, a logistic regression based method proposed by Schouten et al. (1993) and a Poisson regression approach proposed by Zou (2004) and Carter, Lipsitz, and Tilley (2005). Log binomial regression resulted in "failure" rates (non-convergence, out-of-bounds predicted probabilities) as high as 59%. Estimates by the method of Schouten et al. (1993) produced fitted log binomial probabilities greater than unity in up to 19% of samples to which a log binomial model had been successfully fit and in up to 78% of samples when the log binomial model fit failed. Similar percentages were observed for the Poisson regression approach. Coefficient and standard error estimates from the three models were similar. Rejection rates for goodness of fit tests for log binomial fit were around 5%. Power of goodness of fit tests was modest when an incorrect logistic regression model was fit. Examples demonstrate the use of the methods. Uncritical use of the log binomial regression model is not recommended.

[1]  H C Van Houwelingen,et al.  Risk ratio and rate ratio estimation in case-cohort designs: hypertension and cardiovascular mortality. , 1993, Statistics in medicine.

[2]  O. Miettinen,et al.  Confounding: essence and detection. , 1981, American journal of epidemiology.

[3]  G. Zou,et al.  A modified poisson regression approach to prospective studies with binary data. , 2004, American journal of epidemiology.

[4]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[5]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[6]  Sander Greenland,et al.  Model-based estimation of relative risks and other epidemiologic measures in studies of common outcomes and in case-control studies. , 2004, American journal of epidemiology.

[7]  Xiaonan Xue,et al.  Estimating the relative risk in cohort studies and clinical trials of common outcomes. , 2003, American journal of epidemiology.

[8]  D. Consonni,et al.  Estimation of prevalence rate ratios from cross-sectional data. , 1995, International journal of epidemiology.

[9]  M R Petersen,et al.  Prevalence proportion ratios: estimation and hypothesis testing. , 1998, International journal of epidemiology.

[10]  S. Lipsitz,et al.  Quasi-likelihood estimation for relative risk regression models. , 2005, Biostatistics.

[11]  T. Dwyer,et al.  Parental smoking and infant respiratory infection: how important is not smoking in the same room with the baby? , 2003, American journal of public health.

[12]  S Greenland,et al.  Interpretation and choice of effect measures in epidemiologic analyses. , 1987, American journal of epidemiology.

[13]  B. Efron,et al.  Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information , 1978 .

[14]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[15]  Gerhard Osius,et al.  Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom , 1992 .

[16]  Joseph G. Pigeon,et al.  A cautionary note about assessing the fit of logistic regression models , 1999 .

[17]  S Wacholder,et al.  Binomial regression in GLIM: estimating risk ratios and risk differences. , 1986, American journal of epidemiology.

[18]  Case-Cohort Analysis of Agricultural Pesticide Applications near Maternal Residence and Selected Causes of Fetal Death , 2001 .

[19]  G. Apolone,et al.  One model, several results: the paradox of the Hosmer-Lemeshow goodness-of-fit test for the logistic regression model. , 2000, Journal of epidemiology and biostatistics.

[20]  J Lee,et al.  Odds ratio or relative risk for cross-sectional data? , 1994, International journal of epidemiology.