A regression tree approach to identifying subgroups with differential treatment effects

In the fight against hard‐to‐treat diseases such as cancer, it is often difficult to discover new treatments that benefit all subjects. For regulatory agency approval, it is more practical to identify subgroups of subjects for whom the treatment has an enhanced effect. Regression trees are natural for this task because they partition the data space. We briefly review existing regression tree algorithms. Then, we introduce three new ones that are practically free of selection bias and are applicable to data from randomized trials with two or more treatments, censored response variables, and missing values in the predictor variables. The algorithms extend the generalized unbiased interaction detection and estimation (GUIDE) approach by using three key ideas: (i) treatment as a linear predictor, (ii) chi‐squared tests to detect residual patterns and lack of fit, and (iii) proportional hazards modeling via Poisson regression. Importance scores with thresholds for identifying influential variables are obtained as by‐products. A bootstrap technique is used to construct confidence intervals for the treatment effects in each node. The methods are compared using real and simulated data. Copyright © 2015 John Wiley & Sons, Ltd.

[1]  I. van Mechelen,et al.  Qualitative interaction trees: a tool to identify qualitative treatment–subgroup interactions , 2014, Statistics in medicine.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  W. Loh,et al.  Generalized regression trees , 1995 .

[4]  A. Shelling,et al.  Predictive and prognostic molecular markers for cancer medicine , 2010, Therapeutic advances in medical oncology.

[5]  S. Weisberg,et al.  Applied Linear Regression (2nd ed.). , 1986 .

[6]  Nan M. Laird,et al.  Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques , 1981 .

[7]  Xiaogang Su,et al.  Interaction Trees with Censored Survival Data , 2008, The international journal of biostatistics.

[8]  Z Lou,et al.  Tree-structured prediction for censored survival data and the Cox model. , 1995, Journal of clinical epidemiology.

[9]  Murray Aitkin,et al.  The Fitting of Exponential, Weibull and Extreme Value Distributions to Complex Censored Survival Data using GLIM , 1980 .

[10]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[11]  W. Sauerbrei,et al.  Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node-positive breast cancer patients. German Breast Cancer Study Group. , 1994, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[13]  D. Sloane,et al.  An Introduction to Categorical Data Analysis , 1996 .

[14]  K. Hornik,et al.  Model-Based Recursive Partitioning , 2008 .

[15]  O. Aalen Nonparametric Inference for a Family of Counting Processes , 1978 .

[16]  W. Loh,et al.  Tree-structured proportional hazards regression modeling. , 1994, Biometrics.

[17]  W. Loh,et al.  REGRESSION TREES WITH UNBIASED VARIABLE SELECTION AND INTERACTION DETECTION , 2002 .

[18]  K. Hornik,et al.  party : A Laboratory for Recursive Partytioning , 2009 .

[19]  Jeffrey S. Simonoff,et al.  An Investigation of Missing Data Methods for Classification Trees , 2006, J. Mach. Learn. Res..

[20]  M Schumacher,et al.  Randomized and non-randomized patients in clinical trials: experiences with comprehensive cohort studies. , 1995, Statistics in medicine.

[21]  W. Loh,et al.  Regression tree models for designed experiments , 2006, math/0611192.

[22]  Xin Yan,et al.  Facilitating score and causal inference trees for large observational studies , 2012, J. Mach. Learn. Res..

[23]  Hansheng Wang,et al.  Subgroup Analysis via Recursive Partitioning , 2009, J. Mach. Learn. Res..

[24]  I. Lipkovich,et al.  Subgroup identification based on differential effect search—A recursive partitioning method for establishing response to treatment in patient subpopulations , 2011, Statistics in medicine.

[25]  Raymond G. Hoffmann,et al.  A Handbook of Statistical Analyses Using Stata, 2ndedn,Sophia Rabe-Hesketh and Brian Everitt,Chapman&Hall/CRC,London,U.K.,1998. No. of pages: xi+215. Price:£19.95. ISBN 0-849-30387-7 , 2002 .

[26]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[27]  F. E. Satterthwaite An approximate distribution of estimates of variance components. , 1946, Biometrics.

[28]  W. Loh,et al.  SPLIT SELECTION METHODS FOR CLASSIFICATION TREES , 1997 .

[29]  W. Loh,et al.  Improving the precision of classification trees , 2010, 1011.0608.

[30]  Hyunjoong Kim,et al.  Classification Trees With Unbiased Multiway Splits , 2001 .

[31]  Antonio Ciampi,et al.  Tree-structured subgroup analysis for censored survival data: Validation of computationally inexpensive model selection criteria , 2005, Stat. Comput..

[32]  A. Italiano,et al.  Prognostic or predictive? It's time to get back to definitions! , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[33]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[34]  M. Begg An introduction to categorical data analysis (2nd edn). Alan Agresti, John Wiley & Sons, Inc., Hoboken, New Jersey, 2007. No. of Pages: 400. Price: $100.95. ISBN: 978‐0‐471‐22618‐5 , 2009 .

[35]  Wei-Yin Loh,et al.  Variable Selection for Classification and Regression in Large p, Small n Problems , 2012 .

[36]  A. Agresti An introduction to categorical data analysis , 1997 .

[37]  E. B. Wilson,et al.  The Distribution of Chi-Square. , 1931, Proceedings of the National Academy of Sciences of the United States of America.

[38]  P. Royston,et al.  Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials , 1999 .

[39]  Gordon Johnston,et al.  Statistical Models and Methods for Lifetime Data , 2003, Technometrics.

[40]  J. M. Taylor,et al.  Subgroup identification from randomized clinical trial data , 2011, Statistics in medicine.

[41]  J. Tebbs,et al.  An Introduction to Categorical Data Analysis , 2008 .

[42]  Wei-Yin Loh,et al.  Fifty Years of Classification and Regression Trees , 2014 .