Counterfactually Guided Policy Transfer in Clinical Settings

Reliably transferring treatment policies learned in one clinical environment to another is currently limited by challenges related to domain shift. In this paper we address off-policy learning for sequential decision making under domain shift -- a scenario susceptible to catastrophic overconfidence -- which is highly relevant to a high-stakes clinical settings where the target domain may also be data-scarce. We propose a two-fold counterfactual regularization procedure to improve off-policy learning, addressing domain shift and data scarcity. First, we utilize an informative prior derived from a data-rich source environment to indirectly improve drawing counterfactual example observations. Then, these samples are then used to learn a policy for the target domain, regularized by the source policy through KL-divergence. In simulated sepsis treatment, our counterfactual policy transfer procedure significantly improves the performance of a learned treatment policy.

[1]  David Lopez-Paz,et al.  Invariant Risk Minimization , 2019, ArXiv.

[2]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[3]  Finale Doshi-Velez,et al.  POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning , 2020, AISTATS.

[4]  Susan Athey,et al.  Machine Learning and Causal Inference for Policy Evaluation , 2015, KDD.

[5]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[6]  Alessandro Lazaric,et al.  Transfer in Reinforcement Learning: A Framework and a Survey , 2012, Reinforcement Learning.

[7]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[8]  Aldo A. Faisal,et al.  The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care , 2018, Nature Medicine.

[9]  Jiming Liu,et al.  Reinforcement Learning in Healthcare: A Survey , 2019, ACM Comput. Surv..

[10]  Lihong Li,et al.  Policy Certificates: Towards Accountable Reinforcement Learning , 2018, ICML.

[11]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[12]  Barbara E. Engelhardt,et al.  A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units , 2017, UAI.

[13]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[14]  Stephen E. Lapinsky,et al.  Interaction Between Fluids and Vasoactive Agents on Mortality in Septic Shock: A Multicenter, Observational Study* , 2014, Critical care medicine.

[15]  Tommi S. Jaakkola,et al.  On the Partition Function and Random Maximum A-Posteriori Perturbations , 2012, ICML.

[16]  Joelle Pineau,et al.  A Variance Analysis for POMDP Policy Evaluation , 2008, AAAI.

[17]  David Silver,et al.  On Inductive Biases in Deep Reinforcement Learning , 2019, ArXiv.

[18]  Michael L. Littman,et al.  Quantifying Uncertainty in Batch Personalized Sequential Decision Making , 2014, AAAI Workshop: Modern Artificial Intelligence for Health Analytics.

[19]  Shie Mannor,et al.  Regularized Policy Iteration , 2008, NIPS.

[20]  Yuval Tassa,et al.  Relative Entropy Regularized Policy Iteration , 2018, ArXiv.

[21]  Nicolas Heess,et al.  Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search , 2018, ICLR.

[22]  Leo Anthony Celi,et al.  The “inconvenient truth” about AI in healthcare , 2019, npj Digital Medicine.

[23]  Fredrik D. Johansson,et al.  Guidelines for reinforcement learning in healthcare , 2019, Nature Medicine.

[24]  David Sontag,et al.  Counterfactual Off-Policy Evaluation with Gumbel-Max Structural Causal Models , 2019, ICML.

[25]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[26]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[27]  Yao Liu,et al.  Combining Parametric and Nonparametric Models for Off-Policy Evaluation , 2019, ICML.

[28]  Volker Roth,et al.  Cause-Effect Deep Information Bottleneck For Incomplete Covariates , 2018 .

[29]  Tavpritesh Sethi,et al.  Learning to Address Health Inequality in the United States with a Bayesian Decision Network , 2018, AAAI.

[30]  Elias Bareinboim,et al.  Transportability from Multiple Environments with Limited Experiments: Completeness Results , 2014, NIPS.

[31]  E. Bareinboim,et al.  Generalized Transportability:Synthesis of Experiments from Heterogeneous Domains , 2019 .

[32]  Elias Bareinboim,et al.  Structural Causal Bandits: Where to Intervene? , 2018, NeurIPS.

[33]  Peter Henderson,et al.  Reward Estimation for Variance Reduction in Deep Reinforcement Learning , 2018, CoRL.

[34]  Romain Laroche,et al.  Transfer Reinforcement Learning with Shared Dynamics , 2017, AAAI.

[35]  Tom Heskes,et al.  Selecting Weighting Factors in Logarithmic Opinion Pools , 1997, NIPS.

[36]  R. Luce,et al.  Individual Choice Behavior: A Theoretical Analysis. , 1960 .

[37]  Yao Liu,et al.  Behaviour Policy Estimation in Off-Policy Policy Evaluation: Calibration Matters , 2018, ArXiv.

[38]  Shie Mannor,et al.  Off-Policy Evaluation in Partially Observable Environments , 2020, AAAI.

[39]  P. Marik,et al.  Fluid administration in severe sepsis and septic shock, patterns and outcomes: an analysis of a large national database , 2017, Intensive Care Medicine.

[40]  Jan Peters,et al.  Policy evaluation with temporal differences: a survey and comparison , 2015, J. Mach. Learn. Res..

[41]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[42]  Finale Doshi-Velez,et al.  Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes , 2017, AAAI.

[43]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[44]  Nathan Kallus,et al.  Balanced Policy Evaluation and Learning , 2017, NeurIPS.

[45]  Matthieu Komorowski,et al.  Model-Based Reinforcement Learning for Sepsis Treatment , 2018, ArXiv.

[46]  Suchi Saria,et al.  Integrative Analysis using Coupled Latent Variable Models for Individualizing Prognoses , 2016, J. Mach. Learn. Res..

[47]  Marcello Restelli,et al.  Transfer of Samples in Policy Search via Multiple Importance Sampling , 2019, ICML.

[48]  Tom Schaul,et al.  Successor Features for Transfer in Reinforcement Learning , 2016, NIPS.

[49]  David Sontag,et al.  Why Is My Classifier Discriminatory? , 2018, NeurIPS.

[50]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[51]  Wu Ji,et al.  Early versus delayed administration of norepinephrine in patients with septic shock , 2014, Critical Care.

[52]  David C. Kale,et al.  Do no harm: a roadmap for responsible machine learning for health care , 2019, Nature Medicine.

[53]  John N. Tsitsiklis,et al.  Bias and variance in value function estimation , 2004, ICML.

[54]  Brian W. Powers,et al.  Dissecting racial bias in an algorithm used to manage the health of populations , 2019, Science.

[55]  Suchi Saria,et al.  From development to deployment: dataset shift, causality, and shift-stable models in health AI. , 2019, Biostatistics.

[56]  Damien Ernst,et al.  On overfitting and asymptotic bias in batch reinforcement learning with partial observability , 2017, J. Artif. Intell. Res..

[57]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[58]  Uri Shalit,et al.  Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects , 2020, ArXiv.

[59]  Shie Mannor,et al.  Regularized Policy Iteration with Nonparametric Function Spaces , 2016, J. Mach. Learn. Res..

[60]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[61]  Suchi Saria,et al.  Counterfactual Normalization: Proactively Addressing Dataset Shift Using Causal Mechanisms , 2018, UAI.

[62]  Louis Wehenkel,et al.  Clinical data based optimal STI strategies for HIV: a reinforcement learning approach , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[63]  Volker Roth,et al.  Cause-Effect Deep Information Bottleneck For Systematically Missing Covariates , 2018 .

[64]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[65]  Andrew L. Beam,et al.  Practical guidance on artificial intelligence for health-care data. , 2019, The Lancet. Digital health.

[66]  Suchi Saria,et al.  Reliable Decision Support using Counterfactual Models , 2017, NIPS.