论文信息 - Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

Clinician-in-the-Loop Decision Making: Reinforcement Learning with Near-Optimal Set-Valued Policies

Standard reinforcement learning (RL) aims to find an optimal policy that identifies the best action for each state. However, in healthcare settings, many actions may be near-equivalent with respect to the reward (e.g., survival). We consider an alternative objective -- learning set-valued policies to capture near-equivalent actions that lead to similar cumulative rewards. We propose a model-free algorithm based on temporal difference learning and a near-greedy heuristic for action selection. We analyze the theoretical properties of the proposed algorithm, providing optimality guarantees and demonstrate our approach on simulated environments and a real clinical task. Empirically, the proposed algorithm exhibits good convergence properties and discovers meaningful near-equivalent actions. Our work provides theoretical, as well as practical, foundations for clinician/human-in-the-loop decision making, in which humans (e.g., clinicians, patients) can incorporate additional knowledge (e.g., side effects, patient preference) when selecting among near-equivalent actions.

[1] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[2] H. Robbins. A Stochastic Approximation Method , 1951 .

[3] Peter Szolovits,et al. Deep Reinforcement Learning for Sepsis Treatment , 2017, ArXiv.

[4] Nan Jiang,et al. Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[5] Shamim Nemati,et al. Early Prediction of Sepsis From Clinical Data: The PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[6] Daniel J. Lizotte,et al. Multi-Objective Markov Decision Processes for Data-Driven Decision Support , 2016, J. Mach. Learn. Res..

[7] M. Matthay,et al. Sepsis: pathophysiology and clinical management , 2016, British Medical Journal.

[8] John Langford,et al. Approximately Optimal Approximate Reinforcement Learning , 2002, ICML.

[9] Matthieu Komorowski,et al. The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making , 2018, ArXiv.

[10] Shimon Whiteson,et al. A theoretical and empirical analysis of Expected Sarsa , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[11] Susan A. Murphy,et al. Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..

[12] Michael Gao,et al. Machine learning for early detection of sepsis: an internal and temporal validation study , 2020, JAMIA open.

[13] Finale Doshi-Velez,et al. Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies , 2019, IJCAI.

[14] F. V. van Haren,et al. Fluid resuscitation in human sepsis: Time to rewrite history? , 2017, Annals of Intensive Care.

[15] D. Rubin,et al. Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[16] David Williams,et al. Probability with Martingales , 1991, Cambridge mathematical textbooks.

[17] G. Escobar,et al. Hospital deaths in patients with sepsis from 2 independent cohorts. , 2014, JAMA.

[18] Ashish Sharma,et al. Early Prediction of Sepsis from Clinical Data: the PhysioNet/Computing in Cardiology Challenge 2019 , 2019, 2019 Computing in Cardiology (CinC).

[19] Shamim Nemati,et al. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[20] Sergey Levine,et al. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[21] Philip S. Thomas,et al. Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[22] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[23] Joelle Pineau,et al. Non-Deterministic Policies in Markovian Decision Processes , 2014, J. Artif. Intell. Res..

[24] Aldo A. Faisal,et al. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care , 2018, Nature Medicine.

[25] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[26] P. Pronovost,et al. A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[27] Srivatsan Srinivasan,et al. Evaluating Reinforcement Learning Algorithms in Observational Health Settings , 2018, ArXiv.

[28] Peter Szolovits,et al. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.