Explained Variation for Logistic Regression - Small Sample Adjustments, Confidence Intervals and Predictive Precision

Summary The proportion of explained variation in logistic regression can be expressed by the multiple R 2 origin- ally developed for the general linear model (cf. Mittlboc k and Schemper (1996)). In this paper we present a detailed investigation of this measure in small samples and/or with many covariates and propose either of two adjustments, one being a direct analogue of R 2 of the general linear model, and the other being based on shrinkage. Furthermore, we explore the use of bootstrap confidence intervals and give a table of the expected variability of estimates of explained variation for samples of varying sizes. We recommend to quantify gains of predictive precision due to prognostic factors by both rela- tive and absolute measures. For binary outcomes the components of the relative measure, R 2 , are suita- ble absolute measures of predictive precision. They are interpretable as average absolute residuals con- ditional on using prognostic factors and without such information. We motivate application of the presented measures by the statistical analysis of a study of physical characteristics of urine possibly related to the presence of calcium oxalate crystals.

[1]  John B. Willett,et al.  Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis , 1988 .

[2]  Inge S. Helland,et al.  On the Interpretation and Use of R 2 in Regression Analysis , 1987 .

[3]  J. Hilden The Area under the ROC Curve and Its Competitors , 1991, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  S. Menard Coefficients of Determination for Multiple Logistic Regression Analysis , 2000 .

[5]  Marc Buyse R2: a useful measure of model performance when predicting a dichotomous outcome by A. Ash and M. Schwartz, Statistics in Medicine, 18, 375–384 (1999) , 2000 .

[6]  J. Shao,et al.  The jackknife and bootstrap , 1996 .

[7]  A Agresti,et al.  Summarizing the predictive power of a generalized linear model. , 2000, Statistics in medicine.

[8]  Richard Simon,et al.  Explained Residual Variation, Explained Risk, and Goodness of Fit , 1991 .

[9]  A. Ash,et al.  R2: a useful measure of model performance when predicting a dichotomous outcome. , 1999, Statistics in medicine.

[10]  N. Wermuth,et al.  A Comment on the Coefficient of Determination for Binary Responses , 1992 .

[11]  D. F. Andrews,et al.  Data : a collection of problems from many fields for the student and research worker , 1985 .

[12]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[13]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[14]  M Schemper,et al.  Explained variation for logistic regression. , 1996, Statistics in medicine.

[15]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[16]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[17]  J. Copas,et al.  Using regression models for prediction: shrinkage and regression to the mean , 1997, Statistical methods in medical research.