An omnibus lack of fit test in logistic regression with sparse data

The usefulness of logistic regression depends to a great extent on the correct specification of the relation between a binary response and characteristics of the unit on which the response is recoded. Currently used methods for testing for misspecification (lack of fit) of a proposed logistic regression model do not perform well when a data set contains almost as many distinct covariate vectors as experimental units, a condition referred to as sparsity. A new algorithm for grouping sparse data to create pseudo replicates and using them to test for lack of fit is developed. A simulation study illustrates settings in which the new test is superior to existing ones. Analysis of a dataset consisting of the ages of menarche of Warsaw girls is also used to compare the new and existing lack of fit tests.

[1]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[2]  H. Milicer,et al.  Age at menarche in Warsaw girls in 1965. , 1966, Human biology.

[3]  K. Pearson On the Criterion that a Given System of Deviations from the Probable in the Case of a Correlated System of Variables is Such that it Can be Reasonably Supposed to have Arisen from Random Sampling , 1900 .

[4]  Erik Pulkstenis,et al.  Two goodness‐of‐fit tests for logistic regression models with continuous covariates , 2002, Statistics in medicine.

[5]  Francisco J. Aranda-Ordaz,et al.  On Two Families of Transformations to Additivity for Binary Response Data , 1981 .

[6]  E J Bedrick,et al.  Assessing the fit of the logistic regression model to individual matched sets of case-control data. , 1996, Biometrics.

[7]  Oliver Kuss,et al.  Global goodness‐of‐fit tests in logistic regression with sparse data , 2002, Statistics in medicine.

[8]  G. Apolone,et al.  One model, several results: the paradox of the Hosmer-Lemeshow goodness-of-fit test for the logistic regression model. , 2000, Journal of epidemiology and biostatistics.

[9]  T. Stukel Generalized Logistic Models , 1988 .

[10]  A. Tsiatis A note on a goodness-of-fit test for the logistic regression model , 1980 .

[11]  Stephan Dreiseitl,et al.  Nomographic representation of logistic regression models: A case study using patient self-assessment data , 2005, J. Biomed. Informatics.

[12]  J. Lewis,et al.  Probit Analysis (3rd ed). , 1972 .

[13]  J. Vaupel,et al.  Haplotype effects on human survival: logistic regression models applied to unphased genotype data. , 2005, Annals of human genetics.

[14]  D. Hosmer,et al.  A review of goodness of fit statistics for use in the development of logistic regression models. , 1982, American journal of epidemiology.

[15]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[16]  Joseph G. Pigeon,et al.  A cautionary note about assessing the fit of logistic regression models , 1999 .

[17]  Karl Pearson F.R.S. X. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling , 2009 .

[18]  Scott Evans,et al.  A comparison of goodness of fit tests for the logistic GEE model , 2005, Statistics in medicine.

[19]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[20]  Xian-Jin Xie,et al.  Increasing the power: A practical approach to goodness-of-fit test for logistic regression models with continuous predictors , 2008, Comput. Stat. Data Anal..