A Binary Regression Adaptive Goodness-of-fit Test (BAGofT)

The Pearson's $\chi^2$ test and residual deviance test are two classical goodness-of-fit tests for binary regression models such as logistic regression. These two tests cannot be applied when we have one or more continuous covariates in the data, a quite common situation in practice. In that case, the most widely used approach is the Hosmer-Lemeshow test, which partitions the covariate space into groups according to quantiles of the fitted probabilities from all the observations. However, its grouping scheme is not flexible enough to explore how to adversarially partition the data space in order to enhance the power. In this work, we propose a new methodology, named binary regression adaptive grouping goodness-of-fit test (BAGofT), to address the above concern. It is a two-stage solution where the first stage adaptively selects candidate partitions using "training" data, and the second stage performs $\chi^2$ tests with necessary corrections based on "test" data. A proper data splitting ensures that the test has desirable size and power properties. From our experimental results, BAGofT performs much better than Hosmer-Lemeshow test in many situations.

[1]  Yuhong Yang,et al.  On assessing binary regression models based on ungrouped data. , 2019, Biometrics.

[2]  Yanyuan Ma,et al.  Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. , 2013, Electronic journal of statistics.

[3]  Therese A. Stukel,et al.  Generalized logistic models , 1988 .

[4]  D. Hosmer,et al.  A review of goodness of fit statistics for use in the development of logistic regression models. , 1982, American journal of epidemiology.

[5]  Chris D. Orme,et al.  The Calculation of the Information Matrix Test for Binary Data Models , 1988 .

[6]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[7]  Oliver Kuss,et al.  Global goodness‐of‐fit tests in logistic regression with sparse data , 2002, Statistics in medicine.

[8]  Erik Pulkstenis,et al.  Two goodness‐of‐fit tests for logistic regression models with continuous covariates , 2002, Statistics in medicine.

[9]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[10]  Gerhard Osius,et al.  Normal Goodness-of-Fit Tests for Multinomial Models with Large Degrees of Freedom , 1992 .

[11]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[12]  Keith A. Boroevich,et al.  Risk prediction models for dementia constructed by supervised principal component analysis using miRNA expression data , 2019, Communications Biology.

[13]  C. P. Farrington,et al.  On Assessing goodness of fit of generalized linear models to sparse data , 1996 .

[14]  Ying Liu,et al.  An omnibus lack of fit test in logistic regression with sparse data , 2012, Stat. Methods Appl..

[15]  Joseph G. Pigeon,et al.  An Improved Goodness of Fit Statistic for Probability Prediction Models , 1999 .

[16]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[17]  Peter McCullagh,et al.  On the asymptotic distribution of pearson's statistic in linear exponential-family models , 1985 .

[18]  Xian-Jin Xie,et al.  Increasing the power: A practical approach to goodness-of-fit test for logistic regression models with continuous predictors , 2008, Comput. Stat. Data Anal..