The performance of automated case-mix adjustment regression model building methods in a health outcome prediction setting

We have previously described a system for monitoring a number of healthcare outcomes using case-mix adjustment models. It is desirable to automate the model fitting process in such a system if monitoring covers a large number of outcome measures or subgroup analyses. Our aim was to compare the performance of three different variable selection strategies: “manual”, “automated” backward elimination and re-categorisation, and including all variables at once, irrespective of their apparent importance, with automated re-categorisation. Logistic regression models for predicting in-hospital mortality and emergency readmission within 28 days were fitted to an administrative database for 78 diagnosis groups and 126 procedures from 1996 to 2006 for National Health Services hospital trusts in England. The performance of models was assessed with Receiver Operating Characteristic (ROC) c statistics, (measuring discrimination) and Brier score (assessing the average of the predictive accuracy). Overall, discrimination was similar for diagnoses and procedures and consistently better for mortality than for emergency readmission. Brier scores were generally low overall (showing higher accuracy) and were lower for procedures than diagnoses, with a few exceptions for emergency readmission within 28 days. Among the three variable selection strategies, the automated procedure had similar performance to the manual method in almost all cases except low-risk groups with few outcome events. For the rapid generation of multiple case-mix models we suggest applying automated modelling to reduce the time required, in particular when examining different outcomes of large numbers of procedures and diseases in routinely collected administrative health data.

[1]  R. Stolzenberg,et al.  Multiple Regression Analysis , 2004 .

[2]  Paul D. Allison,et al.  Logistic Regression Using the SAS System : Theory and Application , 1999 .

[3]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[4]  Leland Wilkinson,et al.  Tests of Significance in Forward Selection Regression With an F-to-Enter Stopping Rule , 1981 .

[5]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[6]  S. Weisberg,et al.  Applied Linear Regression (2nd ed.). , 1986 .

[7]  Stanley Lemeshow,et al.  Applied Logistic Regression, Second Edition , 1989 .

[8]  J. Melkman,et al.  The Information Centre , 1961 .

[9]  A. Bottle,et al.  Intelligent information: a national system for monitoring clinical performance. , 2007, Health services research.

[10]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[11]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[12]  Peter C Austin,et al.  Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality. , 2004, Journal of clinical epidemiology.

[13]  W J Mackillop,et al.  Measuring the accuracy of prognostic judgments in oncology. , 1997, Journal of clinical epidemiology.

[14]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[15]  N. Cook Use and Misuse of the Receiver Operating Characteristic Curve in Risk Prediction , 2007, Circulation.

[16]  J M Wardlaw,et al.  Predicting survival using simple clinical variables: a case study in traumatic brain injury , 1999, Journal of neurology, neurosurgery, and psychiatry.

[17]  Anthony Ralston,et al.  Mathematical Methods for Digital Computers , 1960 .

[18]  J. Ibrahim,et al.  Prior elicitation, variable selection and Bayesian computation for logistic regression models , 1999 .

[19]  Ricardo Cao,et al.  An overview of bootstrap methods for estimating and predicting in time series , 1999 .

[20]  Lucila Ohno-Machado,et al.  Logistic regression and artificial neural network classification models: a methodology review , 2002, J. Biomed. Informatics.

[21]  Jon Nicholl,et al.  Case-mix adjustment in non-randomised observational evaluations: the constant risk fallacy , 2007, Journal of Epidemiology & Community Health.

[22]  Azeem Majeed,et al.  Use of administrative data or clinical databases as predictors of risk of death in hospital: comparison of models , 2007, BMJ : British Medical Journal.

[23]  Yvonne Vergouwe,et al.  Prognosis and prognostic research: what, why, and how? , 2009, BMJ : British Medical Journal.

[24]  Mitsuru Ikeda,et al.  Relationship between Brier score and area under the binormal ROC curve , 2002, Comput. Methods Programs Biomed..

[25]  D. Hosmer,et al.  Applied Logistic Regression , 1991 .

[26]  Peter C Austin,et al.  Bootstrap Methods for Developing Predictive Models , 2004 .

[27]  L. Iezzoni Risk Adjustment for Measuring Healthcare Outcomes , 1994 .

[28]  M. Fireman,et al.  MULTIPLE REGRESSION ANALYSIS OF SOIL DATA , 1954 .