Data preprocessing and mortality prediction: The Physionet/CinC 2012 challenge revisited

The Physionet/CinC 2012 challenge focused on improving patient specific mortality predictions in the intensive care unit. While most of the focus in the challenge was on applying sophisticated machine learning algorithms, little attention was paid to the preprocessing performed on the data a priori. We compare four standard pre-processing methods with a novel Box-Cox outlier rejection technique and analyze their effect on machine learning classifiers for predicting the mortality of ICU patients. The best machine learning model utilized the proposed preprocessing method and achieved an AUROC of 0.848. In general, the AUROC of models using our novel preprocessing method increased, and this increase was as much as 0.02 in some cases. Furthermore, the use of preprocessing improved the performance of regression models to a higher level than that of non-linear techniques such as random forests. We demonstrate that proper preprocessing of the data prior to use in a prognostic model can significantly improve performance. This improvement can be even greater than that provided by more complex non-linear machine learning algorithms.