Personalized HeartSteps

With the recent evolution of mobile health technologies, health scientists are increasingly interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via notification on mobile device and designed to help the user prevent negative health outcomes and promote the adoption and maintenance of healthy behaviors. A JITAI involves a sequence of decision rules (i.e., treatment policy) that takes the user's current context as input and specifies whether and what type of an intervention should be provided at the moment. In this paper, we develop a Reinforcement Learning (RL) algorithm that continuously learns and improves the treatment policy embedded in the JITAI as the data is being collected from the user. This work is motivated by our collaboration on designing the RL algorithm in HeartSteps V2 based on data from HeartSteps V1. HeartSteps is a physical activity mobile health application. The RL algorithm developed in this paper is being used in HeartSteps V2 to decide, five times per day, whether to deliver a context-tailored activity suggestion.

[1]  Anil Aswani,et al.  Non-Stationary Bandits with Habituation and Recovery Dynamics , 2017, Oper. Res..

[2]  Daniel E. Rivera,et al.  Development of a Control-Oriented Model of Social Cognitive Theory for Optimized mHealth Behavioral Interventions , 2020, IEEE Transactions on Control Systems Technology.

[3]  Damien Ernst,et al.  How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies , 2015, ArXiv.

[4]  Benjamin Van Roy,et al.  (More) Efficient Reinforcement Learning via Posterior Sampling , 2013, NIPS.

[5]  Benjamin Van Roy,et al.  Why is Posterior Sampling Better than Optimism for Reinforcement Learning? , 2016, ICML.

[6]  Benjamin Van Roy,et al.  On Optimistic versus Randomized Exploration in Reinforcement Learning , 2017, ArXiv.

[7]  Stephanie T. Lanza,et al.  Control Engineering Methods for the Design of Robust Behavioral Treatments , 2017, IEEE Transactions on Control Systems Technology.

[8]  Philip S. Thomas,et al.  Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning , 2016, ICML.

[9]  Santiago Ontañón,et al.  Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? , 2018, Journal of Behavioral Medicine.

[10]  Ambuj Tewari,et al.  Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support , 2017, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[11]  Benjamin Van Roy,et al.  A Tutorial on Thompson Sampling , 2017, Found. Trends Mach. Learn..

[12]  S. Murphy,et al.  Assessing Time-Varying Causal Effect Moderation in Mobile Health , 2016, Journal of the American Statistical Association.

[13]  Daniel E. Rivera,et al.  Intensively Adaptive Interventions Using Control Systems Engineering: Two Illustrative Examples , 2018 .

[14]  Predrag V. Klasnja,et al.  Using wearable sensors and real time inference to understand human recall of routine activities , 2008, UbiComp.

[15]  S. Zeger,et al.  Longitudinal data analysis using generalized linear models , 1986 .

[16]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[17]  Lihong Li,et al.  An Empirical Evaluation of Thompson Sampling , 2011, NIPS.

[18]  Inbal Nahum-Shani,et al.  Randomised trials for the Fitbit generation , 2015, Significance.

[19]  Mi Zhang,et al.  MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones , 2015, UbiComp.

[20]  Romain Laroche,et al.  On Value Function Representation of Long Horizon Problems , 2018, AAAI.

[21]  Susan Athey,et al.  Estimation Considerations in Contextual Bandits , 2017, ArXiv.

[22]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[23]  Kristjan H. Greenewald,et al.  Action Centered Contextual Bandits , 2017, NIPS.

[24]  Ambuj Tewari,et al.  From Ads to Interventions: Contextual Bandits in Mobile Health , 2017, Mobile Health - Sensors, Analytic Methods, and Applications.

[25]  Kenneth Y. Goldberg,et al.  Personalizing Mobile Fitness Apps using Reinforcement Learning , 2018, IUI Workshops.

[26]  Rémi Munos,et al.  Thompson Sampling: An Asymptotically Optimal Finite-Time Analysis , 2012, ALT.

[27]  Nan Jiang,et al.  The Dependence of Effective Planning Horizon on Model Accuracy , 2015, AAMAS.

[28]  Nan Jiang,et al.  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning , 2015, ICML.

[29]  Ambuj Tewari,et al.  Microrandomized trials: An experimental design for developing just-in-time adaptive interventions. , 2015, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[30]  Ran Gilad-Bachrach,et al.  PopTherapy: coping with stress through pop-culture , 2014, PervasiveHealth.

[31]  Nicholas J. Seewald,et al.  Efficacy of Contextually Tailored Suggestions for Physical Activity: A Micro-randomized Optimization Trial of HeartSteps. , 2018, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[32]  Aditya Gopalan,et al.  Misspecified Linear Bandits , 2017, AAAI.

[33]  Yi Ouyang,et al.  Learning Unknown Markov Decision Processes: A Thompson Sampling Approach , 2017, NIPS.

[34]  Moshe Tennenholtz,et al.  Encouraging Physical Activity in Patients With Diabetes: Intervention Using a Reinforcement Learning System , 2017, Journal of medical Internet research.

[35]  Akshay Krishnamurthy,et al.  Semiparametric Contextual Bandits , 2018, ICML.

[36]  Shipra Agrawal,et al.  Thompson Sampling for Contextual Bandits with Linear Payoffs , 2012, ICML.

[37]  Benjamin Van Roy,et al.  Learning to Optimize via Posterior Sampling , 2013, Math. Oper. Res..

[38]  Ambuj Tewari,et al.  Sample size calculations for micro‐randomized trials in mHealth , 2015, Statistics in medicine.

[39]  Christopher Grimm,et al.  Mitigating Planner Overfitting in Model-Based Reinforcement Learning , 2018, ArXiv.

[40]  Susan A. Murphy,et al.  Just-in-Time but Not Too Much , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[41]  Rémi Munos,et al.  An Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning , 2013 .

[42]  M. Dimitrijevic,et al.  Habituation: effects of regular and stochastic stimulation , 1972, Journal of neurology, neurosurgery, and psychiatry.