Increasing the power: A practical approach to goodness-of-fit test for logistic regression models with continuous predictors

When continuous predictors are present, classical Pearson and deviance goodness-of-fit tests to assess logistic model fit break down. The Hosmer-Lemeshow test can be used in these situations. While simple to perform and widely used, it does not have desirable power in many cases and provides no further information on the source of any detectable lack of fit. Tsiatis proposed a score statistic to test for covariate regional effects. While conceptually elegant, its lack of a general rule for how to partition the covariate space has, to a certain degree, limited its popularity. We propose a new method for goodness-of-fit testing that uses a very general partitioning strategy (clustering) in the covariate space and either a Pearson statistic or a score statistic. Properties of the proposed statistics are discussed, and a simulation study demonstrates increased power to detect model misspecification in a variety of settings. An application of these different methods on data from a clinical trial illustrates their use. Discussions on further improvement of the proposed tests and extending this new method to other data situations, such as ordinal response regression models are also included.

[1]  Geoffrey Stuart Watson,et al.  Some Recent Results in Chi-Square Goodness-of-Fit Tests , 1959 .

[2]  C. Manski,et al.  The Logit Model and Response-Based Samples , 1989 .

[3]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[4]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[5]  H. Chernoff,et al.  The Use of Maximum Likelihood Estimates in {\chi^2} Tests for Goodness of Fit , 1954 .

[6]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[7]  J. Anderson Separate sample logistic discrimination , 1972 .

[8]  Timothy R. C. Read,et al.  Goodness-Of-Fit Statistics for Discrete Multivariate Data , 1988 .

[9]  P. McCullagh,et al.  Generalized Linear Models , 1972, Predictive Analytics.

[10]  W W Hauck,et al.  A consequence of omitted covariates when estimating odds ratios. , 1991, Journal of clinical epidemiology.

[11]  G. W. Milligan,et al.  A monte carlo study of thirty internal criterion measures for cluster analysis , 1981 .

[12]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[13]  Joseph G. Pigeon,et al.  A cautionary note about assessing the fit of logistic regression models , 1999 .

[14]  H. Barnhart,et al.  Goodness-of-fit tests for GEE modeling with binary responses. , 1998, Biometrics.

[15]  D. A. Williams,et al.  Generalized Linear Model Diagnostics Using the Deviance and Single Case Deletions , 1987 .

[16]  N. Nagelkerke,et al.  Logistic regression in case-control studies: the effect of using independent as dependent variables. , 1995, Statistics in medicine.

[17]  A. Tsiatis A note on a goodness-of-fit test for the logistic regression model , 1980 .

[18]  David R. Cox The analysis of binary data , 1970 .

[19]  B. Hindman,et al.  Mild intraoperative hypothermia during surgery for intracranial aneurysm. , 2005, The New England journal of medicine.

[20]  P J Catalano,et al.  Goodness-of-fit for GEE: an example with mental health service utilization. , 1999, Statistics in medicine.

[21]  Biao Zhang,et al.  An information matrix test for logistic regression models based on case-control data , 2001 .

[22]  Erik Pulkstenis,et al.  Two goodness‐of‐fit tests for logistic regression models with continuous covariates , 2002, Statistics in medicine.

[23]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[24]  J. B. Copas,et al.  Unweighted Sum of Squares Test for Proportions , 1989 .

[25]  Nils Lid Hjort,et al.  Goodness‐of‐fit processes for logistic regression: simulation results , 2002, Statistics in medicine.

[26]  M. H. Gail,et al.  Tests for no treatment e?ect in randomized clinical trials , 1988 .

[27]  G. W. Milligan,et al.  An examination of the effect of six types of error perturbation on fifteen clustering algorithms , 1980 .

[28]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[29]  Luciano Molinari Distribution of the chi-squared test in nonstandard situations , 1977 .

[30]  Calyampudi R. Rao,et al.  Linear Statistical Inference and Its Applications. , 1975 .

[31]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[32]  Jeroen Smits,et al.  Testing goodness‐of‐fit of the logistic regression model in case–control studies using sample reweighting , 2005, Statistics in medicine.

[33]  Peter McCullagh,et al.  On the asymptotic distribution of pearson's statistic in linear exponential-family models , 1985 .

[34]  Oliver Kuss,et al.  Global goodness‐of‐fit tests in logistic regression with sparse data , 2002, Statistics in medicine.

[35]  N. Jewell,et al.  Some surprising results about covariate adjustment in logistic regression models , 1991 .

[36]  Gerhard Osius,et al.  Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom , 1992 .

[37]  G. S. Watson,et al.  On Chi‐Square Goodness‐Of‐Fit Tests for Continuous Distributions , 1958 .

[38]  Chris D. Orme,et al.  The Calculation of the Information Matrix Test for Binary Data Models , 1988 .

[39]  M. Gail,et al.  Biased estimates of treatment effect in randomized experiments with nonlinear regressions and omitted covariates , 1984 .

[40]  J. Copas Plotting p against x , 1983 .

[41]  C. P. Farrington,et al.  On Assessing goodness of fit of generalized linear models to sparse data , 1996 .

[42]  Peter McCullagh,et al.  The Conditional Distribution of Goodness-of-Fit Statistics for Discrete Data , 1986 .