Early Prediction of Sepsis via SMOTE Upsampling and Mutual Information Based Downsampling

Sepsis is a life-threatening response to infection that can lead to tissue damage, organ failure and death. The early prediction of sepsis is important, as it reduces undesirable patient outcomes associated with late-stage septic shock. However, effective early prediction is challenging, because the data is often heavily imbalanced against positive sepsis diagnosis. If the class imbalance is not addressed, models trained will tend to overfit in favour of the majority class, leading to degraded performance on the minority class. In this paper, we suggest a two-step method which consists of a mutual information based downsampling algorithm and a Synthetic Minority Over-sampling Technique (SMOTE), in order to effectively perform early prediction of sepsis. Our team, Kent Ridge AI (ranked 77th), obtained a utility score of -0.164 on the full test set by using the proposed two-step method. Additionally, we report crossvalidation results and identify several methods to improve performance.

[1]  Mehul Motani,et al.  SURI: Feature Selection Based on Unique Relevant Information for Health Data , 2018, 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[2]  Ashish Sharma,et al.  Early Prediction of Sepsis from Clinical Data: the PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[3]  J. Friedman Stochastic gradient boosting , 2002 .

[4]  G. Clermont,et al.  Epidemiology of severe sepsis in the United States: Analysis of incidence, outcome, and associated costs of care , 2001, Critical care medicine.

[5]  R. Bellomo,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Dirk Van den Poel,et al.  Handling class imbalance in customer churn prediction , 2009, Expert Syst. Appl..

[8]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[9]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Nicola Torelli,et al.  ROSE: a Package for Binary Imbalanced Learning , 2014, R J..

[11]  Shamim Nemati,et al.  Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[12]  Ajith Abraham,et al.  A Review of Class Imbalance Problem , 2014 .

[13]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[14]  J. Vincent,et al.  The Clinical Challenge of Sepsis Identification and Monitoring , 2016, PLoS medicine.

[15]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[16]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[17]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[18]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.