Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation

Dynamic treatment recommendation systems based on large-scale electronic health records (EHRs) become a key to successfully improve practical clinical outcomes. Prior relevant studies recommend treatments either use supervised learning (e.g. matching the indicator signal which denotes doctor prescriptions), or reinforcement learning (e.g. maximizing evaluation signal which indicates cumulative reward from survival rates). However, none of these studies have considered to combine the benefits of supervised learning and reinforcement learning. In this paper, we propose Supervised Reinforcement Learning with Recurrent Neural Network (SRL-RNN), which fuses them into a synergistic learning framework. Specifically, SRL-RNN applies an off-policy actor-critic framework to handle complex relations among multiple medications, diseases and individual characteristics. The "actor'' in the framework is adjusted by both the indicator signal and evaluation signal to ensure effective prescription and low mortality. RNN is further utilized to solve the Partially-Observed Markov Decision Process (POMDP) problem due to lack of fully observed states in real world applications. Experiments on the publicly real-world dataset, i.e., MIMIC-3, illustrate that our model can reduce the estimated mortality, while providing promising accuracy in matching doctors' prescriptions.

[1]  Anind K. Dey,et al.  Maximum Entropy Inverse Reinforcement Learning , 2008, AAAI.

[2]  Thomas A. Lasko,et al.  Predicting Medications from Diagnostic Codes with Recurrent Neural Networks , 2016, ICLR.

[3]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[4]  Klaus Obermayer,et al.  Risk-Sensitive Reinforcement Learning , 2013, Neural Computation.

[5]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[6]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[7]  Olivier Gevaert,et al.  MicroRNA based Pan-Cancer Diagnosis and Treatment Recommendation , 2016, BMC Bioinformatics.

[8]  Thomas Lengauer,et al.  Selecting anti-HIV therapies based on a variety of genomic and clinical factors , 2008, ISMB.

[9]  Gopal Gupta,et al.  A Physician Advisory System for Chronic Heart Failure management based on knowledge patterns , 2016, Theory and Practice of Logic Programming.

[10]  Ralph Neuneier,et al.  Risk-Sensitive Reinforcement Learning , 1998, Machine Learning.

[11]  Jianying Hu,et al.  Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[12]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[13]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[14]  Andrew G. Barto,et al.  Reinforcement learning in motor control , 1998 .

[15]  S. Murphy,et al.  Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy , 2012, Statistics in medicine.

[16]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[17]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[18]  Lu Wang,et al.  Personalized Prescription for Comorbidity , 2018, DASFAA.

[19]  Shamim Nemati,et al.  Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach , 2016, 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[20]  Peter Szolovits,et al.  Continuous State-Space Models for Optimal Sepsis Treatment: a Deep Reinforcement Learning Approach , 2017, MLHC.

[21]  Yuval Tassa,et al.  Continuous control with deep reinforcement learning , 2015, ICLR.

[22]  Paul E. Utgoff,et al.  A Teaching Method for Reinforcement Learning , 1992, ML.

[23]  Sergey Levine,et al.  Nonlinear Inverse Reinforcement Learning with Gaussian Processes , 2011, NIPS.

[24]  Guy Lever,et al.  Deterministic Policy Gradient Algorithms , 2014, ICML.

[25]  Jianying Hu,et al.  Data Driven Analytics for Personalized Healthcare , 2016 .

[26]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[27]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[28]  Sergey Levine,et al.  Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization , 2016, ICML.

[29]  Walter F. Stewart,et al.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks , 2015, MLHC.

[30]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[31]  Pieter Abbeel,et al.  Apprenticeship learning via inverse reinforcement learning , 2004, ICML.

[32]  J. Andrew Bagnell,et al.  Maximum margin planning , 2006, ICML.

[33]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[34]  Peter Szolovits,et al.  Deep Reinforcement Learning for Sepsis Treatment , 2017, ArXiv.

[35]  Barbara E. Engelhardt,et al.  A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units , 2017, UAI.

[36]  Jimeng Sun,et al.  LEAP: Learning to Prescribe Effective and Safe Treatment Combinations for Multimorbidity , 2017, KDD.

[37]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[38]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[39]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[40]  Patrick M. Pilarski,et al.  Model-Free reinforcement learning with continuous action in practice , 2012, 2012 American Control Conference (ACC).

[41]  P. Marik,et al.  The demise of early goal‐directed therapy for severe sepsis and septic shock , 2015, Acta anaesthesiologica Scandinavica.

[42]  P. Dayan,et al.  Reinforcement Learning: A Computational Perspective , 2006 .

[43]  Susan M Shortreed,et al.  Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential multiple‐assignment randomized Clinical Antipsychotic Trials of Intervention and Effectiveness schizophrenia study , 2012, Journal of the Royal Statistical Society. Series C, Applied statistics.

[44]  Hui Xiong,et al.  Data-driven Automatic Treatment Regimen Development and Recommendation , 2016, KDD.

[45]  Daniel Almirall,et al.  A Pilot SMART for Developing an Adaptive Treatment Strategy for Adolescent Depression , 2016, Journal of clinical child and adolescent psychology : the official journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division 53.

[46]  Peter Szolovits,et al.  Representation and Reinforcement Learning for Personalized Glycemic Control in Septic Patients , 2017, ArXiv.

[47]  J. Tsitsiklis,et al.  Actor-citic agorithms , 1999, NIPS 1999.