On assessing binary regression models based on ungrouped data.

Assessing a binary regression model based on ungrouped data is a commonly encountered but very challenging problem. Although tests, such as Hosmer-Lemeshow test and le Cessie-van Houwelingen test, have been devised and widely used in applications, they often have low power in detecting lack of fit and not much theoretical justification has been made on when they can work well. In this article, we propose a new approach based on a cross-validation voting system to address the problem. In addition to a theoretical guarantee that the probabilities of type I and II errors both converge to zero as the sample size increases for the new method under proper conditions, our simulation results demonstrate that it performs very well.

[1]  Bernard P. Veldkamp,et al.  Multidimensional adaptive testing with constraints on test content , 2002 .

[2]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[3]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[4]  Rodney X. Sturdivant,et al.  Applied Logistic Regression: Hosmer/Applied Logistic Regression , 2005 .

[5]  Yuhong Yang COMPARING LEARNING METHODS FOR CLASSIFICATION , 2006 .

[6]  H. Bondell Testing goodness-of-fit in logistic case-control studies , 2007 .

[7]  S. Keleş,et al.  Statistical Applications in Genetics and Molecular Biology Asymptotic Optimality of Likelihood-Based Cross-Validation , 2011 .

[8]  Yanyuan Ma,et al.  Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation. , 2013, Electronic journal of statistics.

[9]  Nicolás Serrano,et al.  Calibration strategies to validate predictive models: is new always better? , 2012, Intensive Care Medicine.

[10]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[11]  Victoria Stodden,et al.  Reproducing Statistical Results , 2015 .

[12]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[13]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[14]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[15]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[16]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[17]  Yuhong Yang,et al.  Variable Selection Diagnostics Measures for High-Dimensional Regression , 2014 .

[18]  Alan Agresti,et al.  Categorical Data Analysis, 3rd Edition Extra Exercises , 2012 .

[19]  Xiwei Chen,et al.  Statistical Testing Strategies in the Health Sciences , 2016 .

[20]  Frank E. Harrell,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2001 .

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Jianqing Fan,et al.  Local maximum likelihood estimation and inference , 1998 .

[23]  Yuhong Yang CONSISTENCY OF CROSS VALIDATION FOR COMPARING REGRESSION PROCEDURES , 2007, 0803.2963.

[24]  Yuhong Yang,et al.  Cross-validation for selecting a model selection procedure , 2015 .

[25]  Sylvain Arlot,et al.  A survey of cross-validation procedures for model selection , 2009, 0907.4728.