论文信息 - POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning - 字舞流文

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

Finale Doshi-Velez | Michael C. Hughes | Joseph Futoma

[1] Marcello Restelli,et al. Policy Optimization via Importance Sampling , 2018, NeurIPS.

[2] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.

[3] Jürgen Schmidhuber,et al. A reinforcement learning approach for individualizing erythropoietin dosages in hemodialysis patients , 2009, Expert Syst. Appl..

[4] Louis Wehenkel,et al. Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[5] Peter Szolovits,et al. Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.

[6] Christian Igel,et al. Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[7] Amir-massoud Farahmand,et al. Iterative Value-Aware Model Learning , 2018, NeurIPS.

[8] Joelle Pineau,et al. Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[9] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.

[10] Peter Szolovits,et al. Predicting Blood Pressure Response to Fluid Bolus Therapy Using Attention-Based Neural Networks for Clinical Interpretability , 2018, ArXiv.

[11] Andrew Rhodes,et al. Effects of fluid administration on arterial load in septic shock patients , 2015, Intensive Care Medicine.

[12] Guy Shani,et al. Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[13] Aldo A. Faisal,et al. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care , 2018, Nature Medicine.

[14] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[15] Philip S. Thomas,et al. Safe Reinforcement Learning , 2015 .

[16] Finale Doshi-Velez,et al. Combining Kernel and Model Based Learning for HIV Therapy Selection , 2017, CRI.

[17] Yao Liu,et al. Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters , 2018, ArXiv.

[18] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19] Barbara E. Engelhardt,et al. A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units , 2017, UAI.

[20] M. Ghavamzadeh,et al. Robust Policy Optimization with Baseline Guarantees , 2015, 1506.04514.

[21] Nicholas Roy,et al. Bayesian nonparametric approaches for reinforcement learning in partially observable domains , 2012 .

[22] Claude Guerin,et al. High versus low blood-pressure target in patients with septic shock. , 2014, The New England journal of medicine.

[23] David Hsu,et al. QMDP-Net: Deep Learning for Planning under Partial Observability , 2017, NIPS.

[24] Finale Doshi-Velez,et al. Semi-Supervised Prediction-Constrained Topic Models , 2018, AISTATS.

[25] Fredrik D. Johansson,et al. Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.

[26] David Sontag,et al. Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[27] Shimon Whiteson,et al. Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[28] Sergey Levine,et al. Guided Policy Search , 2013, ICML.

[29] Andrew Slavin Ross,et al. Improving Sepsis Treatment Strategies by Combining Deep and Kernel-Based Reinforcement Learning , 2018, AMIA.

[30] Milos Hauskrecht,et al. Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.

[31] Zoubin Ghahramani,et al. Approximate inference for the loss-calibrated Bayesian , 2011, AISTATS.

[32] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[33] Jesse Hoey,et al. Solving POMDPs with Continuous or Large Discrete Observation Spaces , 2005, IJCAI.

[34] Peter Szolovits,et al. MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[35] Shie Mannor,et al. Off-Policy Evaluation in Partially Observable Environments , 2020, AAAI.

[36] Matthieu Komorowski,et al. The Actor Search Tree Critic (ASTC) for Off-Policy POMDP Learning in Medical Decision Making , 2018, ArXiv.

[37] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..

[38] Pieter Abbeel,et al. Using inaccurate models in reinforcement learning , 2006, ICML.

[39] Milind Tambe,et al. Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization , 2018, AAAI.

[40] Alan E Jones,et al. Emergency department hypotension predicts sudden unexpected in-hospital mortality: a prospective cohort study. , 2006, Chest.