Estimating Dynamic Treatment Regimes in Mobile Health Using V-Learning

Abstract The vision for precision medicine is to use individual patient characteristics to inform a personalized treatment plan that leads to the best possible healthcare for each patient. Mobile technologies have an important role to play in this vision as they offer a means to monitor a patient’s health status in real-time and subsequently to deliver interventions if, when, and in the dose that they are needed. Dynamic treatment regimes formalize individualized treatment plans as sequences of decision rules, one per stage of clinical intervention, that map current patient information to a recommended treatment. However, most existing methods for estimating optimal dynamic treatment regimes are designed for a small number of fixed decision points occurring on a coarse time-scale. We propose a new reinforcement learning method for estimating an optimal treatment regime that is applicable to data collected using mobile technologies in an outpatient setting. The proposed method accommodates an indefinite time horizon and minute-by-minute decision making that are common in mobile health applications. We show that the proposed estimators are consistent and asymptotically normal under mild conditions. The proposed methods are applied to estimate an optimal dynamic treatment regime for controlling blood glucose levels in patients with type 1 diabetes.

[1]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[2]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[3]  J. Silverstein,et al.  Predictors of control of diabetes: monitoring may be the key. , 2004, The Journal of pediatrics.

[4]  Syed Monowar Hossain,et al.  mPuff: Automated detection of cigarette smoking puffs from respiration measurements , 2012, 2012 ACM/IEEE 11th International Conference on Information Processing in Sensor Networks (IPSN).

[5]  Erica E M Moodie,et al.  Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[6]  J. Rosenbauer,et al.  Frequency of SMBG correlates with HbA1c and acute complications in children and adolescents with type 1 diabetes , 2011, Pediatric diabetes.

[7]  Eric B. Laber,et al.  Interactive model building for Q-learning. , 2014, Biometrika.

[8]  Eric J Topol,et al.  Can mobile health technologies transform health care? , 2013, JAMA.

[9]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[10]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[11]  Marie Davidian,et al.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.

[12]  Shalabh Bhatnagar,et al.  Toward Off-Policy Learning Control with Function Approximation , 2010, ICML.

[13]  J. Leahy,et al.  Fully Automated Closed-Loop Insulin Delivery Versus Semiautomated Hybrid Control in Pediatric Patients With Type 1 Diabetes Using an Artificial Pancreas , 2008 .

[14]  Marc D. Breton,et al.  Overnight Glucose Control With an Automated, Unified Safety System in Children and Adolescents With Type 1 Diabetes at Diabetes Camp , 2014, Diabetes Care.

[15]  M. Kosorok,et al.  Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine , 2015 .

[16]  T. Wolever,et al.  Sugars and fat have different effects on postprandial glucose responses in normal and type 1 diabetic subjects. , 2011, Nutrition, metabolism, and cardiovascular diseases : NMCD.

[17]  Aaron J. Kowalski Pathway to Artificial Pancreas Systems Revisited: Moving Downstream , 2015, Diabetes Care.

[18]  Marie Löf,et al.  Measures of Physical Activity Using Cell Phones: Validation Using Criterion Methods , 2010, Journal of medical Internet research.

[19]  Jan Peters,et al.  Reinforcement Learning for Robotics , 2008, EWRL 2008.

[20]  Ambuj Tewari,et al.  Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. , 2015, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[21]  Audie A Atienza,et al.  Mobile health technology evaluation: the mHealth evidence workshop. , 2013, American journal of preventive medicine.

[22]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[23]  Susan A. Murphy,et al.  A Batch, Off-Policy, Actor-Critic Algorithm for Optimizing the Average Reward , 2016, ArXiv.

[24]  Benyamin Grosman,et al.  Day and Night Closed-Loop Control Using the Integrated Medtronic Hybrid Closed-Loop System in Type 1 Diabetes at Diabetes Camp , 2015, Diabetes Care.

[25]  H Peter Chase,et al.  Timing of meal insulin boluses to achieve optimal postprandial glycemic control in patients with type 1 diabetes. , 2010, Diabetes technology & therapeutics.

[26]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[27]  D. Maahs,et al.  Outpatient assessment of determinants of glucose excursions in adolescents with type 1 diabetes: proof of concept. , 2012, Diabetes technology & therapeutics.

[28]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[29]  J M Robins,et al.  Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.

[30]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[31]  Ambuj Tewari,et al.  Sample size calculations for micro‐randomized trials in mHealth , 2015, Statistics in medicine.

[32]  C. Quinn,et al.  Cluster-Randomized Trial of a Mobile Phone Personalized Behavioral Intervention for Blood Glucose Control , 2011, Diabetes Care.

[33]  A. Haines,et al.  The Effectiveness of Mobile-Health Technologies to Improve Health Care Service Delivery Processes: A Systematic Review and Meta-Analysis , 2013, PLoS medicine.

[34]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[35]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[36]  Yu-Hong Dai,et al.  Convergence Properties of the BFGS Algoritm , 2002, SIAM J. Optim..

[37]  D A Butler,et al.  Predictors of glycemic control and short-term adverse outcomes in youth with type 1 diabetes. , 2001, The Journal of pediatrics.

[38]  Anastasios A. Tsiatis,et al.  Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[39]  Susan A. Murphy Adaptive treatment strategies , 2008 .

[40]  Bruce W Bode,et al.  Safety of a Hybrid Closed-Loop Insulin Delivery System in Patients With Type 1 Diabetes. , 2016, JAMA.

[41]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[42]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[43]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[44]  Michael R Kosorok,et al.  Developing Adaptive Personalized Therapy for Cystic Fibrosis using Reinforcement Learning , 2012 .

[45]  Eric B. Laber,et al.  Interactive Q-Learning for Quantiles , 2017, Journal of the American Statistical Association.

[46]  Ashkan Ertefaie,et al.  Constructing Dynamic Treatment Regimes in Infinite-Horizon Settings , 2014, 1406.0764.

[47]  K. Weigel,et al.  Radial basis function regression methods for predicting quantitative traits using SNP markers. , 2010, Genetics research.