Regression Trees for Cumulative Incidence Functions

The use of cumulative incidence functions for characterizing the risk of one type of event in the presence of others has become increasingly popular over the past decade. The problems of modeling, estimation and inference have been treated using parametric, nonparametric and semi-parametric methods. Efforts to develop suitable extensions of machine learning methods, such as regression trees and related ensemble methods, have begun only recently. In this paper, we develop a novel approach to building regression trees for estimating cumulative incidence curves in a competing risks setting. The proposed methods employ augmented estimators of the Brier score risk as the primary basis for building and pruning trees. The proposed methods are easily implemented using the R statistical software package. Simulation studies demonstrate the utility of our approach in the competing risks setting. Data from the Radiation Therapy Oncology Group (trial 9410) is used to illustrate these new methods.

[1]  J. Fine,et al.  Parametric regression on cumulative incidence function. , 2007, Biostatistics.

[2]  Thomas A Gerds,et al.  A random forest approach for competing risks based on pseudo‐values , 2013, Statistics in medicine.

[3]  Jason P. Fine,et al.  Direct parametric inference for the cumulative incidence function , 2006 .

[4]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[5]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[6]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[7]  Benjamin Movsas,et al.  Sequential vs. concurrent chemoradiation for stage III non-small cell lung cancer: randomized phase III trial RTOG 9410. , 2011, Journal of the National Cancer Institute.

[8]  A. Tsiatis Semiparametric Theory and Missing Data , 2006 .

[9]  Anastasios A Tsiatis,et al.  Doubly‐Robust Estimators of Treatment‐Specific Survival Distributions in Observational Studies with Stratified Sampling , 2013, Biometrics.

[10]  Fiona M. Callaghan,et al.  CLASSIFICATION TREES FOR SURVIVAL DATA WITH COMPETING RISKS , 2008 .

[11]  Jens C. Streibig,et al.  Bioassay analysis using R , 2005 .

[12]  I. James,et al.  Linear regression with censored data , 1979 .

[13]  Kjell A. Doksum,et al.  Estimation and Testing in a Two-Sample Generalized Odds-Rate Model , 1988 .

[14]  Mei-Jie Zhang,et al.  Predicting cumulative incidence probability by direct binomial regression , 2008 .

[15]  Robert Gray,et al.  A Proportional Hazards Model for the Subdistribution of a Competing Risk , 1999 .

[16]  R. Gray A Class of $K$-Sample Tests for Comparing the Cumulative Incidence of a Competing Risk , 1988 .

[17]  Hemant Ishwaran,et al.  Random survival forests for competing risks. , 2014, Biostatistics.

[18]  S. Dudoit,et al.  Tree-based multivariate regression and density estimation with right-censored data , 2004 .

[19]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[20]  Liqun Diao,et al.  Doubly robust survival trees , 2016, Statistics in medicine.

[21]  Mei-Jie Zhang,et al.  The proportional odds cumulative incidence model for competing risks , 2015, Biometrics.

[22]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[23]  J. Dignam,et al.  The Use and Interpretation of Competing Risks Regression Models , 2012, Clinical Cancer Research.

[24]  Y. Cheng The International Journal of Biostatistics Modeling Cumulative Incidences of Dementia and Dementia-Free Death Using a Novel Three-Parameter Logistic Function , 2011 .

[25]  Karen Lostritto,et al.  A Partitioning Deletion/Substitution/Addition Algorithm for Creating Survival Risk Groups , 2011, Biometrics.

[26]  Yu Cheng,et al.  Constrained parametric model for simultaneous inference of two cumulative incidence functions , 2013, Biometrical journal. Biometrische Zeitschrift.

[27]  Robert L. Strawderman,et al.  Estimating the Mean of an Increasing Stochastic Process at a Censored Stopping Time , 2000 .