Q‐learning for estimating optimal dynamic treatment rules from observational data

The area of dynamic treatment regimes (DTR) aims to make inference about adaptive, multistage decision-making in clinical practice. A DTR is a set of decision rules, one per interval of treatment, where each decision is a function of treatment and covariate history that returns a recommended treatment. Q-learning is a popular method from the reinforcement learning literature that has recently been applied to estimate DTRs. While, in principle, Q-learning can be used for both randomized and observational data, the focus in the literature thus far has been exclusively on the randomized treatment setting. We extend the method to incorporate measured confounding covariates, using direct adjustment and a variety of propensity score approaches. The methods are examined under various settings including non-regular scenarios. We illustrate the methods in examining the effect of breastfeeding on vocabulary testing, based on data from the Promotion of Breastfeeding Intervention Trial.

[1]  J. Robins,et al.  Estimation and extrapolation of optimal treatment and testing strategies , 2008, Statistics in medicine.

[2]  Eric B. Laber,et al.  Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[3]  J. Robins,et al.  Comparison of dynamic treatment regimes via inverse probability weighting. , 2006, Basic & clinical pharmacology & toxicology.

[4]  Olli Saarela,et al.  Optimal Dynamic Regimes: Presenting a Case for Predictive Inference , 2010, The international journal of biostatistics.

[5]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[6]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[7]  Robert W Platt,et al.  Feeding effects on growth during infancy. , 2004, The Journal of pediatrics.

[8]  Joelle Pineau,et al.  Constructing evidence-based treatment strategies using methods from computer science. , 2007, Drug and alcohol dependence.

[9]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[10]  Bibhas Chakraborty,et al.  Dynamic treatment regimes for managing chronic health conditions: a statistical perspective. , 2011, American journal of public health.

[11]  Jasjeet S. Sekhon,et al.  Genetic Optimization Using Derivatives , 2011, Political Analysis.

[12]  M. J. van der Laan,et al.  Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules , 2007, The international journal of biostatistics.

[13]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[14]  Phil Ansell,et al.  Regret‐Regression for Optimal Dynamic Treatment Regimes , 2010, Biometrics.

[15]  Erica E M Moodie,et al.  Estimating Optimal Dynamic Regimes: Correcting Bias under the Null , 2009, Scandinavian journal of statistics, theory and applications.

[16]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[17]  Robert W Platt,et al.  Breastfeeding and infant growth: biology or bias? , 2002, Pediatrics.

[18]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[19]  E. Hodnett,et al.  Infant growth and health outcomes associated with 3 compared with 6 mo of exclusive breastfeeding. , 2003, The American journal of clinical nutrition.

[20]  M S Kramer,et al.  Promotion of Breastfeeding Intervention Trial (PROBIT): a randomized trial in the Republic of Belarus. , 2001, JAMA.

[21]  H. Sung,et al.  Evaluating multiple treatment courses in clinical trials. , 2000, Statistics in medicine.

[22]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[23]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[24]  Yingqi Zhao,et al.  Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m‐Out‐of‐n Bootstrap Scheme , 2013, Biometrics.

[25]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[26]  Bibhas Chakraborty Estimating optimal dynamic treatment regimes with shared decision rules across stages : an extension of Q-learning , 2011 .

[27]  Richard M Martin,et al.  Effects of prolonged and exclusive breastfeeding on child height, weight, adiposity, and blood pressure at age 6.5 y: evidence from a large randomized trial. , 2007, The American journal of clinical nutrition.

[28]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[29]  Robert W Platt,et al.  Breastfeeding and child cognitive development: new evidence from a large randomized trial. , 2008, Archives of general psychiatry.

[30]  James W. Anderson,et al.  Breast-feeding and cognitive development: a meta-analysis. , 1999, The American journal of clinical nutrition.