Dynamic treatment regimes: Technical challenges and applications

Dynamic treatment regimes are of growing interest across the clinical sciences because these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. Formally, a dynamic treatment regime is a sequence of decision rules, one per stage of clinical intervention. Each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review a critical inferential challenge that results from nonregularity, which often arises in this area. In particular, nonregularity arises in inference for parameters in the optimal dynamic treatment regime; the asymptotic, limiting, distribution of estimators are sensitive to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Pharmacological and Behavioral Treatments for Children with ADHD Trial as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area.

[1]  A. Cohen,et al.  ESTIMATION OF THE LARGER OF TWO NORMAL MEANS , 1968 .

[2]  R. Olshen The Conditional Level of the F—Test , 1973 .

[3]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[4]  D. Freedman,et al.  Some Asymptotic Theory for the Bootstrap , 1981 .

[5]  G. Casella,et al.  Estimating a Bounded Normal Mean , 1981 .

[6]  P. Bickel Minimax Estimation of the Mean of a Normal Distribution when the Parameter Space is Restricted , 1981 .

[7]  J. Robins A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect , 1986 .

[8]  J. Robins Addendum to “a new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect” , 1987 .

[9]  Aad Van Der Vbart,et al.  ON DIFFERENTIABLE FUNCTIONALS , 1988 .

[10]  Hani Doss,et al.  Bias Reduction When There Is No Unbiased Estimate. , 1989 .

[11]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[12]  L. Brown,et al.  Nonexistence of Informative Unbiased Estimators in Singular Problems , 1993 .

[13]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[14]  R. Berger,et al.  P Values Maximized Over a Confidence Set for the Nuisance Parameter , 1994 .

[15]  Jon A. Wellner,et al.  Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[16]  James M. Robins,et al.  Causal Inference from Complex Longitudinal Data , 1997 .

[17]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[18]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[19]  Philip W. Lavori,et al.  A design for testing clinical strategies: biased adaptive within‐subject randomization , 2000 .

[20]  D. Andrews Inconsistency of the Bootstrap when a Parameter is on the Boundary of the Parameter Space , 2000 .

[21]  Hannes Leeb,et al.  The Finite-Sample Distribution of Post-Model-Selection Estimators, and Uniform Versus Non-Uniform Approximations , 2000 .

[22]  J. Robins,et al.  Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. , 2000, Epidemiology.

[23]  J M Robins,et al.  Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.

[24]  D. Andrews Testing When a Parameter Is on the Boundary of the Maintained Hypothesis , 2001 .

[25]  D. Rubin,et al.  Principal Stratification in Causal Inference , 2002, Biometrics.

[26]  G. Ginsburg,et al.  The path to personalized medicine. , 2002, Current opinion in chemical biology.

[27]  A. Rosalsky A survey of limit laws for bootstrapped sums , 2003 .

[28]  M.,et al.  THE FINITE-SAMPLE DISTRIBUTION OF POST-MODEL-SELECTION ESTIMATORS AND UNIFORM VERSUS NONUNIFORM APPROXIMATIONS , 2003, Econometric Theory.

[29]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[30]  Vijay R. Konda,et al.  OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[31]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[32]  Notes on the bias-variance trade-off phenomenon , 2004 .

[33]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[34]  Erica Moodie,et al.  Dynamic treatment regimes. , 2004, Clinical trials.

[35]  William E. Strawderman,et al.  Estimation in restricted parameter spaces: a review , 2004 .

[36]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[37]  B. M. Pötscher,et al.  MODEL SELECTION AND INFERENCE: FACTS AND FICTION , 2005, Econometric Theory.

[38]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[39]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[40]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[41]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[42]  M. J. Laan Causal Effect Models for Intention to Treat and Realistic Individualized Treatment Rules , 2006 .

[43]  D. Andrews,et al.  Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection , 2007 .

[44]  M. J. van der Laan,et al.  The International Journal of Biostatistics Causal Effect Models for Realistic Individualized Treatment and Intention to Treat Rules , 2011 .

[45]  J. Robins,et al.  Estimation and extrapolation of optimal treatment and testing strategies , 2008, Statistics in medicine.

[46]  M. Kosorok Introduction to Empirical Processes and Semiparametric Inference , 2008 .

[47]  K. Hirano,et al.  Impossibility Results for Nondifferentiable Functionals , 2012 .

[48]  Donald W. K. Andrews,et al.  Incorrect asymptotic size of subsampling procedures based on post-consistent model selection estimators , 2009 .

[49]  Erica E M Moodie,et al.  Estimating Optimal Dynamic Regimes: Correcting Bias under the Null , 2009, Scandinavian journal of statistics, theory and applications.

[50]  Eric B. Laber,et al.  Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[51]  Susan Murphy,et al.  Inference for non-regular parameters in optimal dynamic treatment regimes , 2010, Statistical methods in medical research.

[52]  James M Robins,et al.  The International Journal of Biostatistics CAUSAL INFERENCE Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes , Part II : Proofs of Results , 2011 .

[53]  J. Robins,et al.  The International Journal of Biostatistics CAUSAL INFERENCE Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes , Part I : Main Content , 2011 .

[54]  U. Rieder,et al.  Markov Decision Processes , 2010 .

[55]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[56]  Phil Ansell,et al.  Regret‐Regression for Optimal Dynamic Treatment Regimes , 2010, Biometrics.

[57]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[58]  Michael Rosenblum,et al.  Marginal Structural Models , 2011 .

[59]  Susan A Murphy,et al.  Adaptive Confidence Intervals for the Test Error in Classification , 2011, Journal of the American Statistical Association.

[60]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[61]  Inbal Nahum-Shani,et al.  Q-learning: a data analysis method for constructing adaptive interventions. , 2012, Psychological methods.

[62]  S. Murphy,et al.  A "SMART" design for building individualized treatment sequences. , 2012, Annual review of clinical psychology.

[63]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[64]  Leonard Bickman,et al.  The Technology of Measurement Feedback Systems. , 2012, Couple & family psychology.

[65]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[66]  S. Murphy,et al.  Experimental design and primary data analysis methods for comparing adaptive interventions. , 2012, Psychological methods.

[67]  J. Ainsworth,et al.  Intelligent real-time therapy: Harnessing the power of machine learning to optimise the delivery of momentary cognitive–behavioural interventions , 2012, Journal of mental health.

[68]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[69]  M. Pepe,et al.  Statistical Methods for Evaluating and Comparing Biomarkers for Patient Treatment Selection , 2013 .

[70]  Marie Davidian,et al.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.

[71]  Michael R. Kosorok,et al.  Adaptive Q-learning , 2013 .

[72]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[73]  Yingqi Zhao,et al.  Inference for Optimal Dynamic Treatment Regimes Using an Adaptive m‐Out‐of‐n Bootstrap Scheme , 2013, Biometrics.

[74]  I. van Mechelen,et al.  Qualitative interaction trees: a tool to identify qualitative treatment–subgroup interactions , 2014, Statistics in medicine.

[75]  Eric B. Laber,et al.  Interactive model building for Q-learning. , 2014, Biometrika.

[76]  Anastasios A. Tsiatis,et al.  Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[77]  Donglin Zeng,et al.  New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.

[78]  M R Kosorok,et al.  Penalized Q-Learning for Dynamic Treatment Regimens. , 2011, Statistica Sinica.