Q‐learning residual analysis: application to the effectiveness of sequences of antipsychotic medications for patients with schizophrenia

Q-learning is a regression-based approach that uses longitudinal data to construct dynamic treatment regimes, which are sequences of decision rules that use patient information to inform future treatment decisions. An optimal dynamic treatment regime is composed of a sequence of decision rules that indicate how to optimally individualize treatment using the patients' baseline and time-varying characteristics to optimize the final outcome. Constructing optimal dynamic regimes using Q-learning depends heavily on the assumption that regression models at each decision point are correctly specified; yet model checking in the context of Q-learning has been largely overlooked in the current literature. In this article, we show that residual plots obtained from standard Q-learning models may fail to adequately check the quality of the model fit. We present a modified Q-learning procedure that accommodates residual analyses using standard tools. We present simulation studies showing the advantage of the proposed modification over standard Q-learning. We illustrate this new Q-learning approach using data collected from a sequential multiple assignment randomized trial of patients with schizophrenia. Copyright © 2016 John Wiley & Sons, Ltd.

[1]  W T Carpenter,et al.  The Quality of Life Scale: an instrument for rating the schizophrenic deficit syndrome. , 1984, Schizophrenia bulletin.

[2]  Susan A Murphy,et al.  Adaptive Confidence Intervals for the Test Error in Classification , 2011, Journal of the American Statistical Association.

[3]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[4]  Susan A. Murphy,et al.  A Conceptual Framework for Adaptive Preventive Interventions , 2004, Prevention Science.

[5]  S. Murphy,et al.  A "SMART" design for building individualized treatment sequences. , 2012, Annual review of clinical psychology.

[6]  Inbal Nahum-Shani,et al.  Q-learning: a data analysis method for constructing adaptive interventions. , 2012, Psychological methods.

[7]  Erica E. M. Moodie,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[8]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[9]  Eric B. Laber,et al.  Interactive model building for Q-learning. , 2014, Biometrika.

[10]  Joelle Pineau,et al.  A multiple imputation strategy for sequential multiple assignment randomized trials. , 2017, Statistics in medicine.

[11]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[12]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[13]  E. Moodie,et al.  Model Checking with Residuals for g-estimation of Optimal Dynamic Treatment Regimes , 2010, The international journal of biostatistics.

[14]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[15]  J. Lieberman,et al.  The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) project: schizophrenia trial design and protocol development. , 2003, Schizophrenia bulletin.

[16]  T. Stroup,et al.  Assessing Clinical and Functional Outcomes in the Clinical Antipsychotic Trials of Intervention Effectiveness (catie) Schizophrenia Trial Send Reprint Requests to Clinical Outcome Measures: Primary Outcome Clinical and Functional Outcomes Table 1. Catie Schizophrenia Trial Centers' Clinical and Func , 2022 .

[17]  Erica E M Moodie,et al.  Estimating Optimal Dynamic Regimes: Correcting Bias under the Null , 2009, Scandinavian journal of statistics, theory and applications.

[18]  S. Kay,et al.  The positive and negative syndrome scale (PANSS) for schizophrenia. , 1987, Schizophrenia bulletin.

[19]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[20]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[21]  Ian W McKeague,et al.  Estimation of treatment policies based on functional predictors. , 2014, Statistica Sinica.

[22]  Yingqi Zhao,et al.  Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m‐Out‐of‐n Bootstrap Scheme , 2013, Biometrics.

[23]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[24]  S. Murphy,et al.  Developing adaptive treatment strategies in substance abuse research. , 2007, Drug and alcohol dependence.

[25]  Anastasios A. Tsiatis,et al.  Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[26]  Eric B. Laber,et al.  Estimation of optimal dynamic treatment regimes , 2014, Clinical trials.

[27]  Phil Ansell,et al.  Regret‐Regression for Optimal Dynamic Treatment Regimes , 2010, Biometrics.

[28]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[29]  H. Sung,et al.  Evaluating multiple treatment courses in clinical trials. , 2000, Statistics in medicine.

[30]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[31]  Eric B. Laber,et al.  Dynamic treatment regimes: Technical challenges and applications , 2014 .

[32]  M. Kosorok,et al.  Q-LEARNING WITH CENSORED DATA. , 2012, Annals of statistics.

[33]  D. Pregibon Logistic Regression Diagnostics , 1981 .

[34]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .

[35]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[36]  Philip W. Lavori,et al.  A design for testing clinical strategies: biased adaptive within‐subject randomization , 2000 .