Reward ignorant modeling of dynamic treatment regimes

Personalized medicine optimizes patient outcome by tailoring treatments to patient-level characteristics. This approach is formalized by dynamic treatment regimes (DTRs): decision rules that take patient information as input and output recommended treatment decisions. The DTR literature has seen the development of increasingly sophisticated causal inference techniques that attempt to address the limitations of our typically observational datasets. Often overlooked, however, is that in practice most patients may be expected to receive optimal or near-optimal treatment, and so the outcome used as part of a typical DTR analysis may provide limited information. In light of this, we propose considering a more standard analysis: ignore the outcome and elicit an optimal DTR by modeling the observed treatment as a function of relevant covariates. This offers a far simpler analysis and, in some settings, improved optimal treatment identification. To distinguish this approach from more traditional DTR analyses, we term it reward ignorant modeling, and also introduce the concept of multimethod analysis, whereby different analysis methods are used in settings with multiple treatment decisions. We demonstrate this concept through a variety of simulation studies, and through analysis of data from the International Warfarin Pharmacogenetics Consortium, which also serve as motivation for this work.

[1]  Donglin Zeng,et al.  Personalized Dose Finding Using Outcome Weighted Learning , 2016, Journal of the American Statistical Association.

[2]  R. Altman,et al.  Estimation of the warfarin dose with clinical and pharmacogenetic data. , 2009, The New England journal of medicine.

[3]  B. Chakraborty,et al.  Statistical Methods for Dynamic Treatment Regimes: Reinforcement Learning, Causal Inference, and Personalized Medicine , 2013 .

[4]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[5]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[6]  M. Gail,et al.  Testing for qualitative interactions between treatment effects and patient subsets. , 1985, Biometrics.

[7]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[8]  Ree Dawson,et al.  Dynamic treatment regimes: practical design considerations , 2004, Clinical trials.

[9]  Michael P Wallace,et al.  Doubly‐robust dynamic treatment regimen estimation via weighted least squares , 2015, Biometrics.

[10]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[11]  Michael R Kosorok,et al.  Residual Weighted Learning for Estimating Individualized Treatment Rules , 2015, Journal of the American Statistical Association.

[12]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .