Estimating Heterogeneous Treatment Effect on Multivariate Responses Using Random Forests

Estimating the individualized treatment effect has become one of the most popular topics in statistics and machine learning communities in recent years. Most existing methods focus on modeling the heterogeneous treatment effects for univariate outcomes. However, many biomedical studies are interested in studying multiple highly correlated endpoints at the same time. We propose a random forest model that simultaneously estimates individualized treatment effects of multivariate outcomes. We consider a popular study design where covariates and outcomes are measured both before and after the intervention. The proposed model uses oblique splitting rules to partition population space to the neighborhood that experiences distinct treatment effects. An extensive simulation study suggests that the proposed method outperforms existing methods in various nonlinear settings. We further apply the proposed method to two nutrition studies investigating the effects of food consumption on gastrointestinal microbiota composition and clinical biomarkers. The method has been implemented in a freely available R package MOTE.RF at https://github.com/boyiguo1/MOTE.RF .

[1]  Jason Brinkley,et al.  A Generalized Estimator of the Attributable Benefit of an Optimal Treatment Regime , 2010, Biometrics.

[2]  Individualizing drug dosage with longitudinal data , 2016, Statistics in medicine.

[3]  Min Zhang,et al.  Estimating optimal treatment regimes from a classification perspective , 2012, Stat.

[4]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[5]  Eric B. Laber,et al.  Tree-based methods for individualized treatment regimes. , 2015, Biometrika.

[6]  H. Flint,et al.  Ruminococcus bromii is a keystone species for the degradation of resistant starch in the human colon , 2012, The ISME Journal.

[7]  Mark R. Segal,et al.  Multivariate random forests , 2011, WIREs Data Mining Knowl. Discov..

[8]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[9]  Ullrich Köthe,et al.  On Oblique Random Forests , 2011, ECML/PKDD.

[10]  Naiman A. Khan,et al.  Avocado Consumption Alters Gastrointestinal Bacteria Abundance and Microbial Metabolite Concentrations among Adults with Overweight or Obesity: A Randomized Controlled Trial , 2020, The Journal of nutrition.

[11]  S. Murphy,et al.  PERFORMANCE GUARANTEES FOR INDIVIDUALIZED TREATMENT RULES. , 2011, Annals of statistics.

[12]  Donglin Zeng,et al.  Estimating Individualized Treatment Rules Using Outcome Weighted Learning , 2012, Journal of the American Statistical Association.

[13]  Madeleine P. Ball,et al.  Harvard Personal Genome Project: lessons from participatory public research , 2014, Genome Medicine.

[14]  Kim-Anh Lê Cao,et al.  mixOmics: An R package for ‘omics feature selection and multiple data integration , 2017, bioRxiv.

[15]  Eric B. Laber,et al.  A Robust Method for Estimating Optimal Treatment Regimes , 2012, Biometrics.

[16]  Paul J. McMurdie,et al.  Bioconductor Workflow for Microbiome Data Analysis: from raw reads to community analyses , 2016, F1000Research.

[17]  Christine B Peterson,et al.  Joint Bayesian variable and graph selection for regression models with network‐structured predictors , 2016, Statistics in medicine.

[18]  Bing Li,et al.  ENVELOPE MODELS FOR PARSIMONIOUS AND EFFICIENT MULTIVARIATE LINEAR REGRESSION , 2010 .

[19]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[20]  A. Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis , 2011, Eur. J. Oper. Res..

[21]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[22]  Susan Athey,et al.  Recursive partitioning for heterogeneous causal effects , 2015, Proceedings of the National Academy of Sciences.

[23]  Marie Davidian,et al.  Using decision lists to construct interpretable and parsimonious treatment regimes , 2015, Biometrics.

[24]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[25]  Michael R Kosorok,et al.  Recursively Imputed Survival Trees , 2012, Journal of the American Statistical Association.

[26]  Xiaogang Su,et al.  Subgroup Analysis via Recursive Partitioning , 2009 .

[27]  W. Loh,et al.  A regression tree approach to identifying subgroups with differential treatment effects , 2014, Statistics in medicine.

[28]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001 .

[31]  Hongzhe Li Statistical and Computational Methods in Microbiome and Metagenomics , 2019, Handbook of Statistical Genomics.

[32]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[33]  Lu Tian,et al.  Effectively Selecting a Target Population for a Future Comparative Study , 2013, Journal of the American Statistical Association.

[34]  Zoubin Ghahramani,et al.  The Random Forest Kernel and other kernels for big data from random partitions , 2014, ArXiv.

[35]  Lu Tian,et al.  A Simple Method for Detecting Interactions between a Treatment and a Large Number of Covariates , 2012, 1212.2995.

[36]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[37]  C. Neville,et al.  Serum amyloid A-related inflammation is lowered by increased fruit and vegetable intake, while high-sensitive C-reactive protein, IL-6 and E-selectin remain unresponsive , 2014, British Journal of Nutrition.

[38]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.

[39]  Arthur Tenenhaus,et al.  Regularized Generalized Canonical Correlation Analysis: A Framework for Sequential Multiblock Component Methods , 2017, Psychometrika.

[40]  M. Peplow The 100 000 Genomes Project , 2016, British Medical Journal.

[41]  L. Tian,et al.  Analysis of randomized comparative clinical trial data for personalized treatment selections. , 2011, Biostatistics.

[42]  R. Olshen,et al.  Tree-structured survival analysis. , 1985, Cancer treatment reports.

[43]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[44]  F. Collins,et al.  A new initiative on precision medicine. , 2015, The New England journal of medicine.

[45]  D. Baer,et al.  Almond Consumption and Processing Affects the Composition of the Gastrointestinal Microbiota of Healthy Adult Men and Women: A Randomized Controlled Trial , 2018, Nutrients.

[46]  Ker-Chau Li,et al.  Sliced Inverse Regression for Dimension Reduction , 1991 .

[47]  V. Tremaroli,et al.  FXR is a molecular target for the effects of vertical sleeve gastrectomy , 2014, Nature.

[48]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[49]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[50]  Frank D. Wood,et al.  Canonical Correlation Forests , 2015, ArXiv.

[51]  Aedín C. Culhane,et al.  Dimension reduction techniques for the integrative analysis of multi-omics data , 2016, Briefings Bioinform..

[52]  F. Bushman,et al.  Structure-constrained sparse canonical correlation analysis with an application to microbiome data analysis. , 2013, Biostatistics.

[53]  M. Kosorok,et al.  Adaptive Treatment Strategies in Practice: Planning Trials and Analyzing Data for Personalized Medicine , 2015 .

[54]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .

[55]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[56]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.

[57]  Shuangge Ma,et al.  Greedy outcome weighted tree learning of optimal personalized treatment rules , 2017, Biometrics.

[58]  Marie Davidian,et al.  Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. , 2013, Biometrika.

[59]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.