Intelligent Pooling in Thompson Sampling for Rapid Personalization in Mobile Health

Mobile health (mHealth) applications can provide users with essential and timely feedback. From physical activity suggestions, to stress-reduction techniques, mHealth can provide a wide spectrum of effective treatments. Personalizing these interventions might vastly improve their effectiveness, as individuals vary widely in their response to treatment. An optimal mHealth policy must address the question of when to intervene, even as this question is likely to differ between individuals. The high amount of noise due to the in situ delivery of mHealth interventions can cripple the learning rate when a policy only has access to a single user’s data. When there is limited time to engage users, a slow learning rate can pose problems, potentially raising the risk that users leave a study. To speed up learning an optimal policy for each user, we propose learning personalized policies via intelligent use of other users’ data. The proposed learning algorithm allows us to pool information from other users in a principled, adaptive manner. The algorithm combines Thompson sampling with a Bayesian random effects model for the reward function. We use the data collected from a real-world mobile health study to build a generative model and evaluate the proposed algorithm in comparison with two natural alternatives: learning the treatment policy separately per person and learning a single treatment policy for all people. This work is motivated by our preparations for a real-world followup study in which the proposed algorithm will be used on a subset of the participants. Equal contribution Harvard Unversity University of Michigan Stanford University. Correspondence to: Sabina Tomkins <sabina tomkins@fas.harvard.edu>. Reinforcement Learning for Real Life (RL4RealLife) Workshop in the 36 th International Conference on Machine Learning, Long Beach, California, USA, 2019. Copyright 2019 by the author(s).

[1]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[2]  Anthony S. Bryk,et al.  Hierarchical Linear Models: Applications and Data Analysis Methods , 1992 .

[3]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[4]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[5]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[6]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[7]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[8]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[9]  Andreas Krause,et al.  Contextual Gaussian Process Bandit Optimization , 2011, NIPS.

[10]  Nando de Freitas,et al.  Portfolio Allocation for Bayesian Optimization , 2010, UAI.

[11]  Andreas Krause,et al.  Parallelizing Exploration-Exploitation Tradeoffs with Gaussian Process Bandit Optimization , 2012, ICML.

[12]  Roni Khardon,et al.  Sparse Gaussian Processes for Multi-task Learning , 2012, ECML/PKDD.

[13]  J.Q.  Shi,et al.  Mixed‐effects Gaussian process functional regression models with application to dose–response curve prediction , 2012, Statistics in medicine.

[14]  Alda Lopes Gançarski,et al.  Hybrid-ε-greedy for Mobile Context-Aware Recommender System , 2012, PAKDD.

[15]  Andreas Krause,et al.  High-Dimensional Gaussian Process Bandits , 2013, NIPS.

[16]  Ran Gilad-Bachrach,et al.  PopTherapy: coping with stress through pop-culture , 2014, PervasiveHealth.

[17]  Li Zhou,et al.  A Survey on Contextual Multi-armed Bandits , 2015, ArXiv.

[18]  Mi Zhang,et al.  MyBehavior: automatic personalized health feedback from user behaviors and preferences using smartphones , 2015, UbiComp.

[19]  Kanitthika Kaewkannate,et al.  A comparison of wearable fitness devices , 2016, BMC Public Health.

[20]  Ambuj Tewari,et al.  Sample size calculations for micro‐randomized trials in mHealth , 2015, Statistics in medicine.

[21]  Volkan Cevher,et al.  Time-Varying Gaussian Process Bandit Optimization , 2016, AISTATS.

[22]  Bolei Zhou,et al.  Optimization as Estimation with Gaussian Processes in Bandit Settings , 2015, AISTATS.

[23]  Andrew Raij,et al.  PREVENTER, a Selection Mechanism for Just-in-Time Preventive Interventions , 2016, IEEE Transactions on Affective Computing.

[24]  Kristjan H. Greenewald,et al.  Action Centered Contextual Bandits , 2017, NIPS.

[25]  Ürün Dogan,et al.  Multi-Task Learning for Contextual Bandits , 2017, NIPS.

[26]  Moshe Tennenholtz,et al.  Encouraging Physical Activity in Patients With Diabetes: Intervention Using a Reinforcement Learning System , 2017, Journal of medical Internet research.

[27]  Miguel A. Labrador,et al.  mStress: A mobile recommender system for just-in-time interventions for stress , 2017, 2017 14th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[28]  Katja Hofmann,et al.  Meta Reinforcement Learning with Latent Variable Gaussian Processes , 2018, UAI.

[29]  S. Murphy,et al.  Assessing Time-Varying Causal Effect Moderation in Mobile Health , 2016, Journal of the American Statistical Association.

[30]  Sergey Levine,et al.  Meta-Reinforcement Learning of Structured Exploration Strategies , 2018, NeurIPS.

[31]  Ambuj Tewari,et al.  Just-in-Time Adaptive Interventions (JITAIs) in Mobile Health: Key Components and Design Principles for Ongoing Health Behavior Support , 2017, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[32]  Kenneth Y. Goldberg,et al.  Personalizing Mobile Fitness Apps using Reinforcement Learning , 2018, IUI Workshops.

[33]  Yuan Yao,et al.  Mixed-effects Gaussian Process Modeling Approach With Application in Injection Molding Processes , 2018 .

[34]  Andrew Gordon Wilson,et al.  GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration , 2018, NeurIPS.

[35]  Santiago Ontañón,et al.  Can the artificial intelligence technique of reinforcement learning use continuously-monitored digital data to optimize treatment for weight loss? , 2018, Journal of Behavioral Medicine.

[36]  Katja Hofmann,et al.  Fast Context Adaptation via Meta-Learning , 2018, ICML.

[37]  Sergey Levine,et al.  Deep Online Learning via Meta-Learning: Continual Adaptation for Model-Based RL , 2018, ICLR.

[38]  Nicholas J. Seewald,et al.  Efficacy of Contextually Tailored Suggestions for Physical Activity: A Micro-randomized Optimization Trial of HeartSteps. , 2018, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.