Doubly‐robust dynamic treatment regimen estimation via weighted least squares

Personalized medicine is a rapidly expanding area of health research wherein patient level information is used to inform their treatment. Dynamic treatment regimens (DTRs) are a means of formalizing the sequence of treatment decisions that characterize personalized management plans. Identifying the DTR which optimizes expected patient outcome is of obvious interest and numerous methods have been proposed for this purpose. We present a new approach which builds on two established methods: Q-learning and G-estimation, offering the doubly robust property of the latter but with ease of implementation much more akin to the former. We outline the underlying theory, provide simulation studies that demonstrate the double-robustness and efficiency properties of our approach, and illustrate its use on data from the Promotion of Breastfeeding Intervention Trial.

[1]  Eric B. Laber,et al.  Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[2]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[3]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[4]  Phil Ansell,et al.  Regret‐Regression for Optimal Dynamic Treatment Regimes , 2010, Biometrics.

[5]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[6]  Michael R Kosorok,et al.  Discussion of combining biomarkers to optimize patient treatment recommendations , 2014, Biometrics.

[7]  David A Stephens,et al.  Simulating sequential multiple assignment randomized trials to generate optimal personalized warfarin dosing strategies , 2014, Clinical trials.

[8]  Promotion of Breastfeeding Intervention Trial (PROBIT) a randomized trial in the Republic of Belarus. , 2001, The Journal of pediatrics.

[9]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[10]  M S Kramer,et al.  Promotion of Breastfeeding Intervention Trial (PROBIT): a randomized trial in the Republic of Belarus. , 2001, JAMA.

[11]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[12]  C. Watkins Learning from delayed rewards , 1989 .

[13]  Anastasios A. Tsiatis,et al.  Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[14]  Holly Janes,et al.  Combining biomarkers to optimize patient treatment recommendations , 2014, Biometrics.

[15]  Eric B. Laber,et al.  Estimation of optimal dynamic treatment regimes , 2014, Clinical trials.

[16]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[17]  J. Robins,et al.  Analysis of semiparametric regression models for repeated outcomes in the presence of missing data , 1995 .

[18]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[19]  Marie Davidian,et al.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.

[20]  D. Rubin Randomization Analysis of Experimental Data: The Fisher Randomization Test Comment , 1980 .

[21]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[22]  E. Moodie,et al.  A note on the variance of doubly-robust G-estimators , 2009 .

[23]  Xuelin Huang,et al.  Analysis of multi‐stage treatments for recurrent diseases , 2012, Statistics in medicine.

[24]  Nema Dean,et al.  Q-Learning: Flexible Learning About Useful Utilities , 2013, Statistics in Biosciences.

[25]  E. Moodie,et al.  Model Checking with Residuals for g-estimation of Optimal Dynamic Treatment Regimes , 2010, The international journal of biostatistics.

[26]  M. Kramer,et al.  Estimating Response-Maximized Decision Rules With Applications to Breastfeeding , 2009 .

[27]  Donglin Zeng,et al.  New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.

[28]  Yingqi Zhao,et al.  Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m‐Out‐of‐n Bootstrap Scheme , 2013, Biometrics.

[29]  James M. Robins,et al.  Causal Inference from Complex Longitudinal Data , 1997 .

[30]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[31]  Peter F Thall,et al.  Evaluation of Viable Dynamic Treatment Regimes in a Sequentially Randomized Trial of Advanced Prostate Cancer , 2012, Journal of the American Statistical Association.

[32]  S. Murphy,et al.  Dynamic Treatment Regimes. , 2014, Annual review of statistics and its application.