Predicting the probability of survival in intensive care unit patients from a small number of variables and training examples

OBJECTIVE Survival probability predictions in critically ill patients are mainly used to measure the efficacy of intensive care unit (ICU) treatment. The available models are functions induced from data on thousands of patients. Eventually, some of the variables used for these purposes are not part of the clinical routine, and may not be registered in some patients. In this paper, we propose a new method to build scoring functions able to make reliable predictions, though functions whose induction only requires records from a small set of patients described by a few variables. METHODS We present a learning method based on the use of support vector machines (SVM), and a detailed study of its prediction performance, in different contexts, of groups of variables defined according to the source of information: monitoring devices, laboratory findings, and demographic and diagnostic features. RESULTS We employed a data set collected in general ICUs at 10 units of hospitals in Spain, 6 of which include coronary patients, while the other 4 do not treat coronary diseases. The total number of patients considered in our study was 2501, 19.83% of whom did not survive. Using these data, we report a comparison between the SVM method proposed here with other approaches based on logistic regression (LR), including a second-level recalibration of release III of the acute physiology and chronic health evaluation (APACHE, a scoring system commonly used in ICUs) induced from the available data. The SVM method significantly outperforms them all from a statistical point of view. Comparison with the commercial version of APACHE III shows that the SVM scores are slightly better when working with data sets of more than 500 patients. CONCLUSIONS From a practical point of view, the implications of the research reported here may be helpful to address the construction of cheap and reliable prediction systems in accordance with the peculiarities of ICUs and kinds of patients.

[1]  W. Knaus,et al.  The apache III prognostic system: customized mortality predictions for Spanish ICU patients , 1998, Intensive Care Medicine.

[2]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[3]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[4]  A. Abu-Hanna,et al.  Identification of high-risk subgroups in very elderly intensive care unit patients , 2007, Critical care.

[5]  L. Ohno-Machado,et al.  Prognosis in critical care. , 2006, Annual review of biomedical engineering.

[6]  W. Knaus,et al.  The APACHE III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. , 1991, Chest.

[7]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[8]  W. Knaus The APACHE III Prognostic System , 1992 .

[9]  Evert de Jonge,et al.  Applying PRIM (Patient Rule Induction Method) and logistic regression for selecting high-risk subgroups in very elderly ICU patients , 2008, Int. J. Medical Informatics.

[10]  Manuel Filipe Santos,et al.  Mortality assessment in intensive care units via adverse events using artificial neural networks , 2006, Artif. Intell. Medicine.

[11]  José Ramón Quevedo,et al.  Prediction of Probability of Survival in Critically Ill Patients Optimizing the Area under the ROC Curve , 2007, IJCAI.

[12]  D C Angus,et al.  Economics of end-of-life care in the intensive care unit , 2001, Critical care medicine.

[13]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[14]  M. Kollef,et al.  Automated computerized intensive care unit severity of illness measure in the Department of Veterans Affairs: Preliminary results , 2000, Critical care medicine.

[15]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[16]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[17]  J. L. Gall,et al.  A simplified acute physiology score for ICU patients , 1984, Critical care medicine.

[18]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[21]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[22]  A. Abu-Hanna,et al.  Prognostic Models in Medicine , 2001, Methods of Information in Medicine.

[23]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[24]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[25]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[26]  S. Lemeshow,et al.  Mortality Probability Models (MPM II) based on an international cohort of intensive care unit patients. , 1993, JAMA.

[27]  Lucila Ohno-Machado,et al.  Effects of SVM parameter optimization on discrimination and calibration for post-procedural PCI mortality , 2007, J. Biomed. Informatics.

[28]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[29]  D. Wagner,et al.  Veterans Affairs intensive care unit risk adjustment model: Validation, updating, recalibration* , 2008, Critical care medicine.

[30]  José Ramón Quevedo,et al.  Feature subset selection for learning preferences: a case study , 2004, ICML.