Residual Weighted Learning for Estimating Individualized Treatment Rules

ABSTRACT Personalized medicine has received increasing attention among statisticians, computer scientists, and clinical practitioners. A major component of personalized medicine is the estimation of individualized treatment rules (ITRs). Recently, Zhao et al. proposed outcome weighted learning (OWL) to construct ITRs that directly optimize the clinical outcome. Although OWL opens the door to introducing machine learning techniques to optimal treatment regimes, it still has some problems in performance. (1) The estimated ITR of OWL is affected by a simple shift of the outcome. (2) The rule from OWL tries to keep treatment assignments that subjects actually received. (3) There is no variable selection mechanism with OWL. All of them weaken the finite sample performance of OWL. In this article, we propose a general framework, called residual weighted learning (RWL), to alleviate these problems, and hence to improve finite sample performance. Unlike OWL which weights misclassification errors by clinical outcomes, RWL weights these errors by residuals of the outcome from a regression fit on clinical covariates excluding treatment assignment. We use the smoothed ramp loss function in RWL and provide a difference of convex (d.c.) algorithm to solve the corresponding nonconvex optimization problem. By estimating residuals with linear models or generalized linear models, RWL can effectively deal with different types of outcomes, such as continuous, binary, and count outcomes. We also propose variable selection methods for linear and nonlinear rules, respectively, to further improve the performance. We show that the resulting estimator of the treatment rule is consistent. We further obtain a rate of convergence for the difference between the expected outcome using the estimated ITR and that of the optimal treatment rule. The performance of the proposed RWL methods is illustrated in simulation studies and in an analysis of cystic fibrosis clinical trial data. Supplementary materials for this article are available online.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[3]  Margaret Rosenfeld,et al.  Early anti-pseudomonal acquisition in young patients with cystic fibrosis: rationale and design of the EPIC clinical trial and observational study'. , 2009, Contemporary clinical trials.

[4]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[5]  Hao Helen Zhang,et al.  Component selection and smoothing in multivariate nonparametric regression , 2006, math/0702659.

[6]  Le Thi Hoai An,et al.  Solving a Class of Linearly Constrained Indefinite Quadratic Problems by D.C. Algorithms , 1997, J. Glob. Optim..

[7]  S. Murphy,et al.  Variable Selection for Qualitative Interactions. , 2011, Statistical methodology.

[8]  Lerato Mohapi,et al.  Timing of antiretroviral therapy for HIV-1 infection and tuberculosis. , 2011, The New England journal of medicine.

[9]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[10]  Li Li,et al.  Support Vector Machines , 2015 .

[11]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[12]  G. Wahba,et al.  Some results on Tchebycheffian spline functions , 1971 .

[13]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[14]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[15]  Charles A. Micchelli,et al.  A DC-programming algorithm for kernel selection , 2006, ICML.

[16]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[17]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[18]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[19]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[20]  Jorge Nocedal,et al.  Remark on “algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization” , 2011, TOMS.

[21]  Genevera I. Allen Automatic Feature Selection via Weighted Kernels and Regularization , 2013 .

[22]  S. Murphy,et al.  Optimal dynamic treatment regimes , 2003 .

[23]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[24]  Glenn Fung,et al.  A Feature Selection Newton Method for Support Vector Machine Classification , 2004, Comput. Optim. Appl..

[25]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[26]  I. Boutron,et al.  Reporting of analyses from randomized controlled trials with multiple arms: a systematic review , 2013, BMC Medicine.

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  J. Lafferty,et al.  Rodeo: Sparse, greedy nonparametric regression , 2008, 0803.1709.

[29]  S. Murphy,et al.  An experimental design for the development of adaptive treatment strategies , 2005, Statistics in medicine.

[30]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[31]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[32]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[33]  M. Kosorok,et al.  Reinforcement Learning Strategies for Clinical Trials in Nonsmall Cell Lung Cancer , 2011, Biometrics.

[34]  Donglin Zeng,et al.  New Statistical Learning Methods for Estimating Optimal Dynamic Treatment Regimes , 2015, Journal of the American Statistical Association.

[35]  Brian D. Fisher,et al.  University of British Columbia , 2002, INTR.

[36]  Don R. Hush,et al.  An Explicit Description of the Reproducing Kernel Hilbert Spaces of Gaussian RBF Kernels , 2006, IEEE Transactions on Information Theory.

[37]  Umer Khan,et al.  Comparative efficacy and safety of 4 randomized regimens to treat early Pseudomonas aeruginosa infection in children with cystic fibrosis. , 2011, Archives of pediatrics & adolescent medicine.

[38]  Ingo Steinwart,et al.  Fast rates for support vector machines using Gaussian kernels , 2007, 0708.1838.

[39]  W. Rudin Real and complex analysis, 3rd ed. , 1987 .

[40]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[42]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[43]  T. Hastie,et al.  Classification of gene microarrays by penalized logistic regression. , 2004, Biostatistics.

[44]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[45]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[46]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[47]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .

[48]  P. Grambsch,et al.  Martingale-based residuals for survival models , 1990 .

[49]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[50]  M. Kosorok,et al.  Reinforcement learning design for cancer clinical trials , 2009, Statistics in medicine.

[51]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[52]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification , 2007, ICML '07.

[53]  James M. Robins,et al.  Optimal Structural Nested Models for Optimal Sequential Decisions , 2004 .

[54]  L. Ahlfors Complex Analysis , 1979 .

[55]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[56]  Yves Grandvalet,et al.  Adaptive Scaling for Feature Selection in SVMs , 2002, NIPS.