Q-Learning: Flexible Learning About Useful Utilities

Dynamic treatment regimes are fast becoming an important part of medicine, with the corresponding change in emphasis from treatment of the disease to treatment of the individual patient. Because of the limited number of trials to evaluate personally tailored treatment sequences, inferring optimal treatment regimes from observational data has increased importance. Q-learning is a popular method for estimating the optimal treatment regime, originally in randomized trials but more recently also in observational data. Previous applications of Q-learning have largely been restricted to continuous utility end-points with linear relationships. This paper is the first attempt at both extending the framework to discrete utilities and implementing the modelling of covariates from linear to more flexible modelling using the generalized additive model (GAM) framework. Simulated data results show that the GAM adapted Q-learning typically outperforms Q-learning with linear models and other frequently-used methods based on propensity scores in terms of coverage and bias/MSE. This represents a promising step toward a more fully general Q-learning approach to estimating optimal dynamic treatment regimes.

[1]  Bibhas Chakraborty Estimating optimal dynamic treatment regimes with shared decision rules across stages : an extension of Q-learning , 2011 .

[2]  H. Sung,et al.  Selecting Therapeutic Strategies Based on Efficacy and Death in Multicourse Clinical Trials , 2002 .

[3]  D. Kupfer,et al.  Background and rationale for the sequenced treatment alternatives to relieve depression (STAR*D) study. , 2003, The Psychiatric clinics of North America.

[4]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[5]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[6]  Bibhas Chakraborty,et al.  Q‐learning for estimating optimal dynamic treatment rules from observational data , 2012, The Canadian journal of statistics = Revue canadienne de statistique.

[7]  Xuelin Huang,et al.  Analysis of multi‐stage treatments for recurrent diseases , 2012, Statistics in medicine.

[8]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[9]  Patrick J. Heagerty,et al.  Proceedings of the Second Seattle Symposium in Biostatistics , 2005 .

[10]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[11]  Robin Henderson,et al.  Estimation of optimal dynamic anticoagulation regimes from observational data: a regret-based approach. , 2006, Statistics in medicine.

[12]  R. Tibshirani,et al.  Generalized additive models for medical research , 1986, Statistical methods in medical research.

[13]  Susan M Shortreed,et al.  Estimating the optimal dynamic antipsychotic treatment regime: evidence from the sequential multiple‐assignment randomized Clinical Antipsychotic Trials of Intervention and Effectiveness schizophrenia study , 2012, Journal of the Royal Statistical Society. Series C, Applied statistics.

[14]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[15]  Bibhas Chakraborty,et al.  Dynamic treatment regimes for managing chronic health conditions: a statistical perspective. , 2011, American journal of public health.

[16]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[17]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  Inbal Nahum-Shani,et al.  Q-learning: a data analysis method for constructing adaptive interventions. , 2012, Psychological methods.

[20]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[21]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[22]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[23]  S. Murphy,et al.  Methodological Challenges in Constructing Effective Treatment Sequences for Chronic Psychiatric Disorders , 2007, Neuropsychopharmacology.

[24]  M R Kosorok,et al.  Penalized Q-Learning for Dynamic Treatment Regimens. , 2011, Statistica Sinica.

[25]  Erica E M Moodie,et al.  Estimating Optimal Dynamic Regimes: Correcting Bias under the Null , 2009, Scandinavian journal of statistics, theory and applications.

[26]  S. Wood Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models , 2011 .

[27]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[28]  H. Sung,et al.  Evaluating multiple treatment courses in clinical trials. , 2000, Statistics in medicine.

[29]  K. Davis,et al.  National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE): Alzheimer disease trial methodology. , 2001, The American journal of geriatric psychiatry : official journal of the American Association for Geriatric Psychiatry.

[30]  Yingqi Zhao,et al.  Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m‐Out‐of‐n Bootstrap Scheme , 2013, Biometrics.

[31]  Ker-Chau Li,et al.  Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set , 1987 .

[32]  S. Wood Stable and Efficient Multiple Smoothing Parameter Estimation for Generalized Additive Models , 2004 .