Reinforcement Learning for Sepsis Treatment: Baselines and Analysis

In this work, we consider the task of learning effective medical treatment policies for sepsis, a dangerous health condition, from observational data. We examine the performance of various reinforcement learning methodologies on this problem, varying the state representation used. We develop careful baselines for the performance of these methods when using different state representations, standardising the reward function formulation and evaluation methdology. Our results illustrate that simple, tabular Q-learning and Deep Q-Learning both lead to the most effective medical treatment strategies, and that temporal encoding in the state representation aids in discovering improved policies.

[1]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[2]  Doina Precup,et al.  Eligibility Traces for Off-Policy Policy Evaluation , 2000, ICML.

[3]  S. Opal,et al.  Sepsis: a roadmap for future research. , 2015, The Lancet. Infectious diseases.

[4]  Peter Szolovits,et al.  Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.

[5]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[6]  Pierre Geurts,et al.  Tree-Based Batch Mode Reinforcement Learning , 2005, J. Mach. Learn. Res..

[7]  Peter Szolovits,et al.  Deep Reinforcement Learning for Sepsis Treatment , 2017, ArXiv.

[8]  Barbara E. Engelhardt,et al.  A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units , 2017, UAI.

[9]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[10]  Matthieu Komorowski,et al.  Model-Based Reinforcement Learning for Sepsis Treatment , 2018, ArXiv.

[11]  Srivatsan Srinivasan,et al.  Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[12]  C. Sprung,et al.  Sepsis in European intensive care units: Results of the SOAP study* , 2006, Critical care medicine.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Yao Liu,et al.  Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters , 2018, ArXiv.

[15]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[16]  Philip S. Thomas,et al.  Importance Sampling for Fair Policy Selection , 2017, UAI.

[17]  Andrew Slavin Ross,et al.  Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning , 2018, AMIA.

[18]  Aldo A. Faisal,et al.  The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care , 2018, Nature Medicine.

[19]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[20]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..