Outcome-Weighted Learning for Personalized Medicine with Multiple Treatment Options

To achieve personalized medicine, an individualized treatment strategy assigning treatment based on an individual's characteristics that leads to the largest benefit can be considered. Recently, a machine learning approach, O-learning, has been proposed to estimate an optimal individualized treatment rule (ITR), but it is developed to make binary decisions and thus limited to compare two treatments. When many treatment options are available, existing methods need to be adapted by transforming a multiple treatment selection problem into multiple binary treatment selections, for example, via one-vs-one or one-vs-all comparisons. However, combining multiple binary treatment selection rules into a single decision rule requires careful consideration, because it is known in the multicategory learning literature that some approaches may lead to ambiguous decision rules. In this work, we propose a novel and efficient method to generalize outcome-weighted learning for binary treatment to multi-treatment settings. We solve a multiple treatment selection problem via sequential weighted support vector machines. We prove that the resulting ITR is Fisher consistent and obtain the convergence rate of the estimated value function to the true optimal value, i.e., the estimated treatment rule leads to the maximal benefit when the data size goes to infinity. We conduct simulations to demonstrate that the proposed method has superior performance in terms of lower mis-allocation rates and improved expected values. An application to a three-arm randomized trial of major depressive disorder shows that an ITR tailored to individual patient's expectancy of treatment efficacy, their baseline depression severity and other characteristics reduces depressive symptoms more than non-personalized treatment strategies (e.g., treating all patients with combined pharmacotherapy and psychotherapy).

[1]  Chris Watkins,et al.  Learning from delayed rewards , 1989 .

[2]  J. Markowitz,et al.  Cognitive behavioral analysis system of psychotherapy and brief supportive psychotherapy for augmentation of antidepressant nonresponse in chronic depression: the REVAMP Trial. , 2009, Archives of general psychiatry.

[3]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[4]  M. Trivedi,et al.  Treatment strategies to improve and sustain remission in major depressive disorder. , 2008, Dialogues in clinical neuroscience.

[5]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[6]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[7]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[8]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[9]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[10]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[11]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[12]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[13]  J. Markowitz,et al.  Social problem solving and depressive symptoms over time: a randomized clinical trial of cognitive-behavioral analysis system of psychotherapy, brief supportive psychotherapy, and pharmacotherapy. , 2011, Journal of consulting and clinical psychology.

[14]  D. Zeng,et al.  Estimation and evaluation of linear individualized treatment rules to guarantee performance , 2018, Biometrics.

[15]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[16]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[17]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[18]  Sandeep Menon,et al.  Clinical and Statistical Considerations in Personalized Medicine , 2014 .

[19]  S. Murphy,et al.  Variable Selection for Qualitative Interactions. , 2011, Statistical methodology.

[20]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[21]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[22]  Susan A. Murphy,et al.  A Generalization Error for Q-Learning , 2005, J. Mach. Learn. Res..

[23]  Erica E M Moodie,et al.  Demystifying Optimal Dynamic Treatment Regimes , 2007, Biometrics.

[24]  Michael R. Kosorok,et al.  Robust Hybrid Learning for Estimating Personalized Dynamic Treatment Regimens , 2016, 1611.02314.