iqLearn: Interactive Q-Learning in R.

Chronic illness treatment strategies must adapt to the evolving health status of the patient receiving treatment. Data-driven dynamic treatment regimes can offer guidance for clinicians and intervention scientists on how to treat patients over time in order to bring about the most favorable clinical outcome on average. Methods for estimating optimal dynamic treatment regimes, such as Q-learning, typically require modeling nonsmooth, nonmonotone transformations of data. Thus, building well-fitting models can be challenging and in some cases may result in a poor estimate of the optimal treatment regime. Interactive Q-learning (IQ-learning) is an alternative to Q-learning that only requires modeling smooth, monotone transformations of the data. The R package iqLearn provides functions for implementing both the IQ-learning and Q-learning algorithms. We demonstrate how to estimate a two-stage optimal treatment policy with iqLearn using a generated data set bmiData which mimics a two-stage randomized body mass index reduction trial with binary treatments at each stage.

[1]  Eric B. Laber,et al.  Interactive Q-Learning for Quantiles , 2017, Journal of the American Statistical Association.

[2]  E. Abrahams Personalized Medicine Coalition Statement on ASCO’s New Conceptual Framework , 2015 .

[3]  Eric B. Laber,et al.  Interactive model building for Q-learning. , 2014, Biometrika.

[4]  Yingqi Zhao,et al.  Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m‐Out‐of‐n Bootstrap Scheme , 2013, Biometrics.

[5]  Bibhas Chakraborty,et al.  Q‐learning for estimating optimal dynamic treatment rules from observational data , 2012, The Canadian journal of statistics = Revue canadienne de statistique.

[6]  M R Kosorok,et al.  Penalized Q-Learning for Dynamic Treatment Regimens. , 2011, Statistica Sinica.

[7]  Reneé H Moore,et al.  Meal Replacements in the Treatment of Adolescent Obesity: A Randomized Controlled Trial , 2011, Obesity.

[8]  F. Collins,et al.  The path to personalized medicine. , 2010, The New England journal of medicine.

[9]  Eric B. Laber,et al.  Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[10]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[11]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[12]  A. Sanabria,et al.  Randomized controlled trial. , 2005, World journal of surgery.

[13]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[14]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[15]  Ree Dawson,et al.  Dynamic treatment regimes: practical design considerations , 2004, Clinical trials.

[16]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[17]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[18]  D. Ruppert,et al.  Transformation and Weighting in Regression , 1988 .

[19]  D. Horvitz,et al.  A Generalization of Sampling Without Replacement from a Finite Universe , 1952 .

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  D. Barrios-Aranibar,et al.  LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS , 2007 .

[22]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[23]  C. Watkins Learning from delayed rewards , 1989 .