Censored Quantile Regression Forest

Random forests are powerful non-parametric regression method but are severely limited in their usage in the presence of randomly censored observations, and naively applied can exhibit poor predictive performance due to the incurred biases. Based on a local adaptive representation of random forests, we develop its regression adjustment for randomly censored regression quantile models. Regression adjustment is based on a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring. The proposed procedure named {\it censored quantile regression forest}, allows us to estimate quantiles of time-to-event without any parametric modeling assumption. We establish its consistency under mild model specifications. Numerical studies showcase a clear advantage of the proposed procedure.

[1]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[2]  S. Athey,et al.  Generalized random forests , 2016, The Annals of Statistics.

[3]  J. Robins,et al.  Estimation of Regression Coefficients When Some Regressors are not Always Observed , 1994 .

[4]  Jean-Philippe Vert,et al.  Consistency of Random Forests , 2014, 1405.2881.

[5]  Yi Lin,et al.  Random Forests and Adaptive Nearest Neighbors , 2006 .

[6]  J. Robins Estimation of the time-dependent accelerated failure time model in the presence of confounding factors , 1992 .

[7]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[8]  Ameet Talwalkar,et al.  Supervised Neighborhoods for Distributed Nonparametric Regression , 2016, AISTATS.

[9]  James M. Robins,et al.  Semiparametric estimation of an accelerated failure time model with time-dependent covariates , 1992 .

[10]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[11]  D. Dabrowska Non-parametric regression with censored survival time data , 1987 .

[12]  R. Koenker Censored Quantile Regression Redux , 2008 .

[13]  Z. Ying,et al.  Rank-based inference for the accelerated failure time model , 2003 .

[14]  Donglin Zeng,et al.  Efficient Estimation for the Accelerated Failure Time Model , 2007 .

[15]  Wenceslao González-Manteiga,et al.  Asymptotic properties of a generalized kaplan-meier estimator with some applications , 1994 .

[16]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[17]  Misha Denil,et al.  Narrowing the Gap: Random Forests In Theory and In Practice , 2013, ICML.

[18]  Luc Devroye,et al.  Consistency of Random Forests and Other Averaging Classifiers , 2008, J. Mach. Learn. Res..

[19]  M. LeBlanc,et al.  Survival Trees by Goodness of Split , 1993 .

[20]  Thomas A. Louis,et al.  Nonparametric analysis of an accelerated failure time model , 1981 .

[21]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[22]  Katharina Burger,et al.  Counting Processes And Survival Analysis , 2016 .

[23]  M. Akritas Nearest Neighbor Estimation of a Bivariate Distribution Under Random Censoring , 1994 .

[24]  I. Keilegom,et al.  Uniform strong convergence results for the conditional kaplan-meier estimator and its quantiles , 1996 .

[25]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[26]  Hani Doss,et al.  An Approach to Nonparametric Regression for Life History Data Using Local Linear Fitting , 1995 .

[27]  Stephen Portnoy,et al.  Censored Regression Quantiles , 2003 .

[28]  Luc Devroye,et al.  On the layered nearest neighbour estimate, the bagged nearest neighbour estimate and the random forest method in regression and classification , 2010, J. Multivar. Anal..

[29]  Stefan Wager,et al.  Estimation and Inference of Heterogeneous Treatment Effects using Random Forests , 2015, Journal of the American Statistical Association.

[30]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[31]  J. V. Ryzin,et al.  Regression Analysis with Randomly Right-Censored Data , 1981 .

[32]  Gérard Biau,et al.  Analysis of a Random Forests Model , 2010, J. Mach. Learn. Res..

[33]  Andrew Martin,et al.  Forest-type Regression with General Losses and Robust Forest , 2017, ICML.

[34]  R B Davis,et al.  Exponential survival trees. , 1989, Statistics in medicine.

[35]  Benjamin A. Olken,et al.  Promises and Perils of Pre-analysis Plans , 2015 .

[36]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[37]  Jian Huang,et al.  Least absolute deviations estimation for the accelerated failure time model , 2007 .

[38]  I. Van Keilegom,et al.  An Adapted Loss Function for Censored Quantile Regression , 2017, Journal of the American Statistical Association.

[39]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.

[40]  Stefan Wager,et al.  Adaptive Concentration of Regression Trees, with Application to Random Forests , 2015 .

[41]  Dorota M. Dabrowska,et al.  Uniform Consistency of the Kernel Conditional Kaplan-Meier Estimate , 1989 .

[42]  R. Olshen,et al.  Tree-structured survival analysis. , 1985, Cancer treatment reports.

[43]  Lee-Jen Wei,et al.  The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. , 1992, Statistics in medicine.

[44]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[45]  Li Ping Yang,et al.  Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data , 1998 .

[46]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[47]  S. Dudoit,et al.  Tree-based multivariate regression and density estimation with right-censored data , 2004 .

[48]  Sylvain Arlot,et al.  Analysis of purely random forests bias , 2014, ArXiv.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  Michael R Kosorok,et al.  Recursively Imputed Survival Trees , 2012, Journal of the American Statistical Association.

[51]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[52]  Limin Peng,et al.  Survival Analysis With Quantile Regression Models , 2008 .