Goodness-of-fit tests for logistic regression models when data are collected using a complex sampling design

Logistic regression models are frequently used in epidemiological studies for estimating associations that demographic, behavioral, and risk factor variables have on a dichotomous outcome, such as disease being present versus absent. After the coefficients in a logistic regression model have been estimated, goodness-of-fit of the resulting model should be examined, particularly if the purpose of the model is to estimate probabilities of event occurrences. While various goodness-of-fit tests have been proposed, the properties of these tests have been studied under the assumption that observations selected were independent and identically distributed. Increasingly, epidemiologists are using large-scale sample survey data when fitting logistic regression models, such as the National Health Interview Survey or the National Health and Nutrition Examination Survey. Unfortunately, for such situations no goodness-of-fit testing procedures have been developed or implemented in available software. To address this problem, goodness-of-fit tests for logistic regression models when data are collected using complex sampling designs are proposed. Properties of the proposed tests were examined using extensive simulation studies and results were compared to traditional goodness-of-fit tests. A Stata ado function svylogitgof for estimating the F-adjusted mean residual test after svylogit fit is available at the author's website http://www.people.vcu.edu/~kjarcher/Research/Data.htm.

[1]  E. Korn,et al.  Analysis of Health Surveys: Korn/Analysis , 1999 .

[2]  E L Korn,et al.  Epidemiologic studies utilizing surveys: accounting for the sampling design. , 1991, American journal of public health.

[3]  Adrian Bowman,et al.  On the use of nonparametric regression for model checking , 1989 .

[4]  Joseph G. Pigeon,et al.  An Improved Goodness of Fit Statistic for Probability Prediction Models , 1999 .

[5]  Edward L. Korn,et al.  Analysis of Large Health Surveys: Accounting for the Sampling Design , 1995 .

[6]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[7]  ScienceDirect Computational statistics & data analysis , 1983 .

[8]  Joseph G. Pigeon,et al.  A cautionary note about assessing the fit of logistic regression models , 1999 .

[9]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[10]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[11]  Charles C. Brown On a goodness of fit test for the logistic model based on score statistics , 1982 .

[12]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[13]  S. le Cessie,et al.  Testing the fit of a regression model via score tests in random effects models. , 1995 .

[14]  J. Rao,et al.  Small-Sample Comparisons of Level and Power for Simple Goodness-of-Fit Statistics under Cluster Sampling , 1987 .

[15]  Gerhard Osius,et al.  Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom , 1992 .

[16]  Chris J. Skinner,et al.  Analysis of complex surveys , 1991 .

[17]  P J Catalano,et al.  Goodness-of-fit for GEE: an example with mental health service utilization. , 1999, Statistics in medicine.

[18]  L. J. Wei,et al.  A Lack-of-Fit Test for the Mean Function in a Generalized Linear Model , 1991 .

[19]  D. Cox Two further applications of a model for binary regression , 1958 .

[20]  Tx Station Stata Statistical Software: Release 7. , 2001 .

[21]  Stanley Lemeshow,et al.  Multiple Logistic Regression , 2005 .

[22]  Stanley Lemeshow,et al.  Goodness-of-fit Test for a Logistic Regression Model Fitted using Survey Sample Data , 2006 .

[23]  A. Tsiatis A note on a goodness-of-fit test for the logistic regression model , 1980 .

[24]  D Commenges,et al.  Illustration of analysis taking into account complex survey considerations: the association between wine consumption and dementia in the PAQUID study. Personnes Ages Quid. , 1998, American journal of epidemiology.

[25]  National Health Interview Survey: research for the 1995-2004 redesign. , 1999, Vital and health statistics. Series 2, Data evaluation and methods research.