Application of Shrinkage Techniques in Logistic Regression Analysis: A Case Study

Logistic regression analysis may well be used to develop a predictive model for a dichotomous medical outcome, such as short‐term mortality. When the data set is small compared to the number of covariables studied, shrinkage techniques may improve predictions. We compared the performance of three variants of shrinkage techniques: 1) a linear shrinkage factor, which shrinks all coefficients with the same factor; 2) penalized maximum likelihood (or ridge regression), where a penalty factor is added to the likelihood function such that coefficients are shrunk individually according to the variance of each covariable; 3) the Lasso, which shrinks some coefficients to zero by setting a constraint on the sum of the absolute values of the coefficients of standardized covariables. Logistic regression models were constructed to predict 30‐day mortality after acute myocardial infarction. Small data sets were created from a large randomized controlled trial, half of which provided independent validation data. We found that all three shrinkage techniques improved the calibration of predictions compared to the standard maximum likelihood estimates. This study illustrates that shrinkage is a valuable tool to overcome some of the problems of overfitting in medical data.

[1]  J. Copas Regression, Prediction and Shrinkage , 1983 .

[2]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[3]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[4]  E W Steyerberg,et al.  Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. , 1999, Journal of clinical epidemiology.

[5]  K. Burnham,et al.  Model selection: An integral part of inference , 1997 .

[6]  S. le Cessie,et al.  Predictive value of statistical models. , 1990, Statistics in medicine.

[7]  Jianming Ye On Measuring and Correcting the Effects of Data Mining and Model Selection , 1998 .

[8]  Robert Gray,et al.  Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis , 1992 .

[9]  J. Habbema,et al.  Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. , 2000, Statistics in medicine.

[10]  J. Habbema,et al.  The measurement of performance in probabilistic diagnosis. II. Trustworthiness of the exact values of the diagnostic probabilities. , 1978, Methods of information in medicine.

[11]  K. Abromeit Music Received , 2023, Notes.

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  E. Braunwald,et al.  Predictors of Early Morbidity and Mortality After Thrombolytic Therapy of Acute Myocardial Infarction: Analyses of Patient Subgroups in the Thrombolysis in Myocardial Infarction (TIMI) Trial, Phase II , 1992, Circulation.

[14]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[15]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[16]  L. Breiman Better subset regression using the nonnegative garrote , 1995 .

[17]  J Col,et al.  Predictors of 30-day mortality in the era of reperfusion for acute myocardial infarction. Results from an international trial of 41,021 patients. GUSTO-I Investigators. , 1995, Circulation.

[18]  Geoffrey E. Hinton,et al.  A comparison of statistical learning methods on the Gusto database. , 1998, Statistics in medicine.

[19]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[20]  J. C. van Houwelingen,et al.  Shrinkage and Penalized Likelihood as Methods to Improve Predictive Accuracy , 2001 .

[21]  S L Hui,et al.  Validation techniques for logistic regression models. , 1991, Statistics in medicine.

[22]  J. C. van Houwelingen,et al.  Predictive value of statistical models , 1990 .

[23]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.