Non-parametric individual treatment effect estimation for survival data with random forests

MOTIVATION Personalized medicine often relies on accurate estimation of a treatment effect for specific subjects. This estimation can be based on the subject's baseline covariates but additional complications arise for a time-to-event response subject to censoring. In this paper, the treatment effect is measured as the difference between the mean survival time of a treated subject and the mean survival time of a control subject. We propose a new random forest method for estimating the individual treatment effect with survival data. The random forest is formed by individual trees built with a splitting rule specifically designed to partition the data according to the individual treatment effect. For a new subject, the forest provides a set of similar subjects from the training data set that can be used to compute an estimation of the individual treatment effect with any adequate method. RESULTS The merits of the proposed method are investigated with a simulation study where it is compared to numerous competitors, including recent state-of-the-art methods. The results indicate that the proposed method has a very good and stable performance to estimate the individual treatment effects. Two examples of application with a colon cancer data and breast cancer data show that the proposed method can detect a treatment effect in a sub-population even when the overall effect is small or nonexistent. AVAILABILITY AND IMPLEMENTATION The authors are working on an R package implementing the proposed method and it will be available soon. In the meantime, the code can be obtained from the first author at sami.tabib@hec.ca. SUPPLEMENTARY INFORMATION Supplementary material is available at Bioinformatics online.

[1]  G. Imbens,et al.  Machine Learning Methods for Estimating Heterogeneous Causal Eects , 2015 .

[2]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[3]  W. Sauerbrei,et al.  Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. , 1994, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Xiaoyu Wang,et al.  TwoPhaseInd: an R package for estimating gene-treatment interactions and discovering predictive markers in randomized clinical trials , 2016, Bioinform..

[5]  D. Rubin,et al.  Causal Inference for Statistics, Social, and Biomedical Sciences: A General Method for Estimating Sampling Variances for Standard Estimators for Average Causal Effects , 2015 .

[6]  C. Tangen,et al.  Fluorouracil plus Levamisole as Effective Adjuvant Therapy after Resection of Stage III Colon Carcinoma: A Final Report , 1995, Annals of Internal Medicine.

[7]  Denis Larocque,et al.  Prediction intervals with random forests , 2020, Statistical methods in medical research.

[8]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[9]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[10]  D. Lin,et al.  Cox regression analysis of multivariate failure time data: the marginal approach. , 1994, Statistics in medicine.

[11]  W. Loh,et al.  A regression tree approach to identifying subgroups with differential treatment effects , 2014, Statistics in medicine.

[12]  Anne-Laure Boulesteix,et al.  Investigating the prediction ability of survival models based on both clinical and omics data: two case studies , 2014, Statistics in medicine.

[13]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[14]  Russ B. Altman,et al.  Bioinformatics challenges for personalized medicine , 2011, Bioinform..

[15]  Victor S. Y. Lo The true lift model: a novel data mining approach to response modeling in database marketing , 2002, SKDD.

[16]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[17]  Patrick Royston,et al.  Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a time-to-event outcome , 2013, BMC Medical Research Methodology.

[18]  Achim Zeileis,et al.  Model-Based Recursive Partitioning for Subgroup Analyses , 2016, The international journal of biostatistics.

[19]  Zhi-Hua Zhou,et al.  Mining heterogeneous causal effects for personalized cancer treatment , 2017, Bioinform..

[20]  Masahiro Takeuchi,et al.  A flexible and coherent test/estimation procedure based on restricted mean survival times for censored time‐to‐event data in randomized clinical trials , 2018, Statistics in medicine.

[21]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[22]  A. Tsiatis,et al.  Utilizing Propensity Scores to Estimate Causal Treatment Effects with Censored Time‐Lagged Data , 2001, Biometrics.

[23]  Erik T Parner,et al.  Causal inference in survival analysis using pseudo‐observations , 2017, Statistics in medicine.

[24]  Richard M. Simon,et al.  Using cross-validation to evaluate predictive accuracy of survival risk classifiers based on high-dimensional data , 2011, Briefings Bioinform..

[25]  Denis Larocque,et al.  Survival forests for data with dependent censoring , 2019, Statistical methods in medical research.

[26]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[27]  T. Fleming,et al.  Surgical adjuvant therapy of large-bowel carcinoma: an evaluation of levamisole and the combination of levamisole and fluorouracil. The North Central Cancer Treatment Group and the Mayo Clinic. , 1989, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[29]  Szymon Jaroszewicz,et al.  Ensemble methods for uplift modeling , 2014, Data Mining and Knowledge Discovery.

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Behram Hansotia,et al.  Incremental value modeling , 2002 .

[32]  Szymon Jaroszewicz,et al.  Decision trees for uplift modeling with single and multiple treatments , 2011, Knowledge and Information Systems.

[33]  P. Royston,et al.  Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials , 1999 .

[34]  LEO GUELMAN,et al.  Uplift Random Forests , 2015, Cybern. Syst..

[35]  Björn Bornkamp,et al.  Subgroup identification in dose‐finding trials via model‐based recursive partitioning , 2018, Statistics in medicine.

[36]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[37]  Denis Larocque,et al.  $$L_1$$L1 splitting rules in survival forests , 2017, Lifetime data analysis.

[38]  Andrew Wey,et al.  Estimating restricted mean treatment effects with stacked survival models , 2014, Statistics in medicine.

[39]  T R Fleming,et al.  Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. , 1990, The New England journal of medicine.