Graphical assessment of internal and external calibration of logistic regression models by using loess smoothers

Predicting the probability of the occurrence of a binary outcome or condition is important in biomedical research. While assessing discrimination is an essential issue in developing and validating binary prediction models, less attention has been paid to methods for assessing model calibration. Calibration refers to the degree of agreement between observed and predicted probabilities and is often assessed by testing for lack‐of‐fit. The objective of our study was to examine the ability of graphical methods to assess the calibration of logistic regression models. We examined lack of internal calibration, which was related to misspecification of the logistic regression model, and external calibration, which was related to an overfit model or to shrinkage of the linear predictor. We conducted an extensive set of Monte Carlo simulations with a locally weighted least squares regression smoother (i.e., the loess algorithm) to examine the ability of graphical methods to assess model calibration. We found that loess‐based methods were able to provide evidence of moderate departures from linearity and indicate omission of a moderately strong interaction. Misspecification of the link function was harder to detect. Visual patterns were clearer with higher sample sizes, higher incidence of the outcome, or higher discrimination. Loess‐based methods were also able to identify the lack of calibration in external validation samples when an overfit regression model had been used. In conclusion, loess‐based smoothing methods are adequate tools to graphically assess calibration and merit wider application. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd

[1]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[2]  E. Steyerberg,et al.  [Regression modeling strategies]. , 2011, Revista espanola de cardiologia.

[3]  J. Copas Plotting p against x , 1983 .

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  P Royston,et al.  The use of cusums and other techniques in modelling continuous covariates in logistic regression. , 1992, Statistics in medicine.

[6]  E. Steyerberg Clinical Prediction Models , 2008, Statistics for Biology and Health.

[7]  Martha Sajatovic,et al.  Clinical Prediction Models , 2013 .

[8]  Ewout W Steyerberg,et al.  Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable , 2012, BMC Medical Research Methodology.

[9]  F. Harrell,et al.  Regression modelling strategies for improved prognostic prediction. , 1984, Statistics in medicine.

[10]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[11]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[12]  D. Cox Two further applications of a model for binary regression , 1958 .

[13]  A. Tsiatis A note on a goodness-of-fit test for the logistic regression model , 1980 .

[14]  N. Obuchowski,et al.  Assessing the Performance of Prediction Models: A Framework for Traditional and Novel Measures , 2010, Epidemiology.

[15]  F. Harrell,et al.  Regression models in clinical studies: determining relationships between predictors and response. , 1988, Journal of the National Cancer Institute.

[16]  Peter C. Austin,et al.  The Relationship Between the C-Statistic of a Risk-adjustment Model and the Accuracy of Hospital Report Cards: A Monte Carlo Study , 2013, Medical care.

[17]  Peter C Austin,et al.  Effectiveness of public report cards for improving the quality of cardiac care: the EFFECT study: a randomized trial. , 2009, JAMA.

[18]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[19]  Peter C Austin,et al.  Predicting mortality among patients hospitalized for heart failure: derivation and validation of a clinical model. , 2003, JAMA.

[20]  Jarrod E Dalton,et al.  Flexible recalibration of binary clinical prediction models , 2013, Statistics in medicine.

[21]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[22]  Therese A. Stukel,et al.  Generalized logistic models , 1988 .

[23]  O. Linton Local Regression Models , 2010 .

[24]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[25]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[26]  Stanley Lemeshow,et al.  Standardizing the power of the Hosmer–Lemeshow goodness of fit test in large data sets , 2013, Statistics in medicine.

[27]  Ashutosh Kumar Singh,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[28]  D. Hosmer,et al.  A comparison of goodness-of-fit tests for the logistic regression model. , 1997, Statistics in medicine.

[29]  Ewout W Steyerberg,et al.  Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods? , 2012, Biometrical journal. Biometrische Zeitschrift.

[30]  B H Chang,et al.  Risk Adjustment for Measuring Health Outcomes: An Application in VA Long term Care , 2001, American journal of medical quality : the official journal of the American College of Medical Quality.

[31]  J. C. van Houwelingen,et al.  A goodness-of-fit test for binary regression models, based on smoothing methods , 1991 .

[32]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[33]  Peter C Austin,et al.  Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes. , 2013, Journal of clinical epidemiology.

[34]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[35]  G A Diamond,et al.  What price perfection? Calibration and discrimination of clinical prediction models. , 1992, Journal of clinical epidemiology.