Set‐valued dynamic treatment regimes for competing outcomes

Dynamic treatment regimes (DTRs) operationalize the clinical decision process as a sequence of functions, one for each clinical decision, where each function maps up-to-date patient information to a single recommended treatment. Current methods for estimating optimal DTRs, for example Q-learning, require the specification of a single outcome by which the "goodness" of competing dynamic treatment regimes is measured. However, this is an over-simplification of the goal of clinical decision making, which aims to balance several potentially competing outcomes, for example, symptom relief and side-effect burden. When there are competing outcomes and patients do not know or cannot communicate their preferences, formation of a single composite outcome that correctly balances the competing outcomes is not possible. This problem also occurs when patient preferences evolve over time. We propose a method for constructing DTRs that accommodates competing outcomes by recommending sets of treatments at each decision point. Formally, we construct a sequence of set-valued functions that take as input up-to-date patient information and give as output a recommended subset of the possible treatments. For a given patient history, the recommended set of treatments contains all treatments that produce non-inferior outcome vectors. Constructing these set-valued functions requires solving a non-trivial enumeration problem. We offer an exact enumeration algorithm by recasting the problem as a linear mixed integer program. The proposed methods are illustrated using data from the CATIE schizophrenia study.

[1]  M. Heo,et al.  Antipsychotic-induced weight gain: a comprehensive research synthesis. , 1999, The American journal of psychiatry.

[2]  D. DeMets,et al.  Fundamentals of Clinical Trials , 1982 .

[3]  Phil Ansell,et al.  Regret‐Regression for Optimal Dynamic Treatment Regimes , 2010, Biometrics.

[4]  S. Kay,et al.  The positive and negative syndrome scale (PANSS) for schizophrenia. , 1987, Schizophrenia bulletin.

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Nimrod Megiddo,et al.  Advances in Economic Theory: On the complexity of linear programming , 1987 .

[7]  Erica E M Moodie,et al.  Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[8]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[9]  Joelle Pineau,et al.  Non-Deterministic Policies in Markovian Decision Processes , 2014, J. Artif. Intell. Res..

[10]  Inbal Nahum-Shani,et al.  Q-learning: a data analysis method for constructing adaptive interventions. , 2012, Psychological methods.

[11]  Michael J Frank,et al.  Patients with schizophrenia demonstrate inconsistent preference judgments for affective and nonaffective stimuli. , 2011, Schizophrenia bulletin.

[12]  C. F. Jeff Wu,et al.  Experiments: Planning, Analysis, and Parameter Design Optimization , 2000 .

[13]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[14]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[15]  Dieter Naber,et al.  Olanzapine versus ziprasidone: results of a 28-week double-blind study in patients with schizophrenia. , 2005, The American journal of psychiatry.

[16]  J. Robins,et al.  The International Journal of Biostatistics CAUSAL INFERENCE Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes , Part I : Main Content , 2011 .

[17]  Eric B. Laber,et al.  Statistical Inference in Dynamic Treatment Regimes , 2010, 1006.5831.

[18]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[19]  Joelle Pineau,et al.  Informing sequential clinical decision-making through reinforcement learning: an empirical study , 2010, Machine Learning.

[20]  Anastasios A. Tsiatis,et al.  Q- and A-learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[21]  S. Murphy,et al.  Experimental design and primary data analysis methods for comparing adaptive interventions. , 2012, Psychological methods.

[22]  Elizabeth Timberlake Kinter Identifying treatment preferences of patients with schizophrenia in Germany: An application of patient -centered care , 2009 .

[23]  M R Kosorok,et al.  Penalized Q-Learning for Dynamic Treatment Regimens. , 2011, Statistica Sinica.

[24]  J. Lieberman,et al.  The National Institute of Mental Health Clinical Antipsychotic Trials of Intervention Effectiveness (CATIE) project: schizophrenia trial design and protocol development. , 2003, Schizophrenia bulletin.

[25]  Robert West,et al.  Moderators and mediators of a web-based computer-tailored smoking cessation program among nicotine patch users. , 2006, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[26]  J. Markowitz,et al.  A comparison of nefazodone, the cognitive behavioral-analysis system of psychotherapy, and their combination for the treatment of chronic depression. , 2000, The New England journal of medicine.

[27]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[28]  D. Barrios-Aranibar,et al.  LEARNING FROM DELAYED REWARDS USING INFLUENCE VALUES APPLIED TO COORDINATION IN MULTI-AGENT SYSTEMS , 2007 .

[29]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[30]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[31]  Susan A. Murphy,et al.  A-Learning for approximate planning , 2004 .

[32]  J. Brian Gray,et al.  Applied Regression Including Computing and Graphics , 1999, Technometrics.

[33]  Susan A. Murphy,et al.  Linear fitted-Q iteration with multiple reward functions , 2013, J. Mach. Learn. Res..