Developing well-calibrated illness severity scores for decision support in the critically ill

Illness severity scores are regularly employed for quality improvement and benchmarking in the intensive care unit, but poor generalization performance, particularly with respect to probability calibration, has limited their use for decision support. These models tend to perform worse in patients at a high risk for mortality. We hypothesized that a sequential modeling approach wherein an initial regression model assigns risk and all patients deemed high risk then have their risk quantified by a second, high-risk-specific, regression model would result in a model with superior calibration across the risk spectrum. We compared this approach to a logistic regression model and a sophisticated machine learning approach, the gradient boosting machine. The sequential approach did not have an effect on the receiver operating characteristic curve or the precision-recall curve but resulted in improved reliability curves. The gradient boosting machine achieved a small improvement in discrimination performance and was similarly calibrated to the sequential models.

[1]  Omar Badawi,et al.  Severity scoring in the critically ill: part 1--interpretation and accuracy of outcome prediction scoring systems. , 2012, Chest.

[2]  E. Bennett,et al.  Comparison of outcome from intensive care admission after adjustment for case mix by the APACHE III prognostic system. , 1999, Chest.

[3]  Michael Wainberg,et al.  Deep learning in biomedicine , 2018, Nature Biotechnology.

[4]  J. Zimmerman,et al.  Intensive care unit length of stay: Benchmarking based on Acute Physiology and Chronic Health Evaluation (APACHE) IV* , 2006, Critical care medicine.

[5]  G. Collins,et al.  Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD): The TRIPOD Statement , 2015, Annals of Internal Medicine.

[6]  Matthew M. Churpek,et al.  The Development of a Machine Learning Inpatient Acute Kidney Injury Prediction Model* , 2018, Critical care medicine.

[7]  S. Lemeshow,et al.  Mortality Probability Models (MPM II) based on an international cohort of intensive care unit patients. , 1993, JAMA.

[8]  C. Naylor,et al.  On the Prospects for a (Deep) Learning Health Care System , 2018, JAMA.

[9]  J. Zimmerman,et al.  Acute Physiology and Chronic Health Evaluation (APACHE) IV: Hospital mortality assessment for today’s critically ill patients* , 2006, Critical care medicine.

[10]  Rich Caruana,et al.  Obtaining Calibrated Probabilities from Boosting , 2005, UAI.

[11]  G. Guyatt,et al.  Discrimination and Calibration of Clinical Prediction Models: Users’ Guides to the Medical Literature , 2017, JAMA.

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  J H Kerr,et al.  Intensive Care Society's Acute Physiology and Chronic Health Evaluation (APACHE II) study in Britain and Ireland: A prospective, multicenter, cohort study comparing two methods for predicting outcome for adult intensive care patients , 1994, Critical care medicine.

[14]  I. Kohane,et al.  Big Data and Machine Learning in Health Care. , 2018, JAMA.

[15]  Geoffrey E. Hinton Deep Learning-A Technology With the Potential to Transform Health Care. , 2018, JAMA.

[16]  J. Zimmerman,et al.  Comparing Observed and Predicted Mortality Among ICUs Using Different Prognostic Systems: Why Do Performance Assessments Differ?* , 2015, Critical care medicine.

[17]  Roger G. Mark,et al.  Real-time Mortality Prediction in the Intensive Care Unit , 2018, AMIA.

[18]  Thomas Higgins,et al.  SAPS 3--From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. , 2005 .

[19]  B. van Calster,et al.  Calibration of Risk Prediction Models , 2015, Medical decision making : an international journal of the Society for Medical Decision Making.

[20]  Alistair E. W. Johnson,et al.  The eICU Collaborative Research Database, a freely available multi-center database for critical care research , 2018, Scientific Data.

[21]  J H Kerr,et al.  Intensive Care Society's APACHE II study in Britain and Ireland--II: Outcome comparisons of intensive care units after adjustment for case mix by the American APACHE II method. , 1993, BMJ.

[22]  Jie Ma,et al.  A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. , 2019, Journal of clinical epidemiology.

[23]  Guanhua Chen,et al.  Calibration Drift Among Regression and Machine Learning Models for Hospital Mortality , 2017, AMIA.

[24]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[25]  J. Vincent,et al.  The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure , 1996, Intensive Care Medicine.

[26]  B. Taylor,et al.  The impact of low‐risk intensive care unit admissions on mortality probabilities by SAPS II, APACHE II and APACHE III , 2002, Anaesthesia.

[27]  Omar Badawi,et al.  Severity scoring in the critically ill: part 2: maximizing value from outcome prediction scoring systems. , 2012, Chest.

[28]  Michael D. Howell,et al.  Predicting Intensive Care Unit Readmission with Machine Learning Using Electronic Health Record Data , 2018, Annals of the American Thoracic Society.

[29]  J. Zimmerman,et al.  Comparison of the Mortality Probability Admission Model III, National Quality Forum, and Acute Physiology and Chronic Health Evaluation IV Hospital Mortality Models: Implications for National Benchmarking* , 2014, Critical care medicine.