Tree-based identification of subgroups for time-varying covariate survival data

Classification and regression tree analyses identify subsets of a sample that differ on an outcome. Discrimination of subsets is performed using recursive binary splitting on a set of covariates, allowing for interactions of variable subgroups not easily captured in standard model building techniques. Using classification and regression tree with epidemiological data can be problematic as there is often a need to adjust for potential confounders and to account for time-varying covariates in the context of right-censored survival data. While classification and regression tree variations exist individually for survival analysis, time-varying covariates and incorporating possible confounders, examples of classification and regression tree using all three together are lacking. We propose a method to identify subsets of time-varying covariate risk factors that affect survival while adjusting for possible confounders. The technique is demonstrated on data from the Bypass Angioplasty Revascularization Investigation 2 Diabetes clinical trial to find combinations of modifiable time-varying cardiac risk factors (e.g. smoking status, blood pressure, lipid levels and HbA1c level) that are associated with time-to-event clinical outcomes.

[1]  P. Grambsch,et al.  Modeling Survival Data: Extending the Cox Model , 2000 .

[2]  Robert E. Schapire,et al.  The strength of weak learnability , 1990, Mach. Learn..

[3]  P. V. Rao,et al.  Applied Survival Analysis: Regression Modeling of Time to Event Data , 2000 .

[4]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[5]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[6]  M Schumacher,et al.  Comparison of the Cox model and the regression tree procedure in analysing a randomized clinical trial. , 1993, Statistics in medicine.

[7]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[8]  B. Gersh,et al.  Rationale for the revascularization arm of the Bypass Angioplasty Revascularization Investigation 2 Diabetes (BARI 2D) Trial. , 2006, The American journal of cardiology.

[9]  Maria Mori Brooks,et al.  A randomized trial of therapies for type 2 diabetes and coronary artery disease. , 2009, The New England journal of medicine.

[10]  M. Magee,et al.  Rationale, design, and methods for glycemic control in the Bypass Angioplasty Revascularization Investigation 2 Diabetes (BARI 2D) Trial. , 2006, The American journal of cardiology.

[11]  M. LeBlanc,et al.  Survival Trees by Goodness of Split , 1993 .

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[14]  F. Harrell,et al.  Prognostic/Clinical Prediction Models: Multivariable Prognostic Models: Issues in Developing Models, Evaluating Assumptions and Adequacy, and Measuring and Reducing Errors , 2005 .

[15]  Mevlut Ture,et al.  Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease , 2008, Expert Syst. Appl..

[16]  J. Kalbfleisch,et al.  The Statistical Analysis of Failure Time Data: Kalbfleisch/The Statistical , 2002 .

[17]  John D. Kalbfleisch,et al.  The Statistical Analysis of Failure Data , 1986, IEEE Transactions on Reliability.

[18]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[19]  Susan M. Chang,et al.  Prognostic factors for survival of patients with glioblastoma: recursive partitioning analysis. , 2004, Neuro-oncology.

[20]  R. Olshen,et al.  Tree-structured survival analysis. , 1985, Cancer treatment reports.

[21]  Charles Kooperberg,et al.  Trees and splines in survival analysis , 1995, Statistical methods in medical research.

[22]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[23]  S. Weisberg Applied Linear Regression: Weisberg/Applied Linear Regression 3e , 2005 .

[24]  E W Steyerberg,et al.  Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. , 1999, Journal of clinical epidemiology.

[25]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1990, COLT '90.

[26]  R. Frye,et al.  Hypotheses, design, and methods for the Bypass Angioplasty Revascularization Investigation 2 Diabetes (BARI 2D) Trial. , 2006, The American journal of cardiology.

[27]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[28]  D. Freedman A Note on Screening Regression Equations , 1983 .

[29]  C. Chatfield Model uncertainty, data mining and statistical inference , 1995 .

[30]  D.,et al.  Regression Models and Life-Tables , 2022 .

[31]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[32]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.

[33]  S. Lemon,et al.  Classification and regression tree analysis in public health: Methodological review and comparison with logistic regression , 2003, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[34]  William T. Abraham,et al.  Risk stratification for in-hospital mortality in acutely decompensated heart failure. Classification and regression tree analysis , 2005 .

[35]  M R Segal,et al.  Survival trees with time-dependent covariates: Application to estimating changes in the incubation period of AIDS , 1995, Lifetime data analysis.