Bayesian Survival Tree Ensembles with Submodel Shrinkage

We consider Bayesian nonparametric estimation of a survival time subject to right-censoring in the presence of potentially high-dimensional predictors. We argue that several approaches, such as random survival forests and existing Bayesian nonparametric approaches, possess several drawbacks, including: computational difficulties; lack of known theoretical properties; and ineffectiveness at filtering out irrelevant predictors. We propose two models based on the Bayesian additive regression trees (BART) framework. The first, Modulated BART (MBART), is fully-nonparametric and models the failure time as the first occurrence of a non-homogeneous Poisson process. The second, CoxBART, uses a Bayesian implementation of Cox’s partial likelihood. These models are adapted to high-dimensional predictors, have default prior specifications, and require simple modifications of existing BART methods to implement. We show the effectiveness of these methods on simulated and benchmark datasets. We also establish that, for a simplified variant of MBART, the posterior distribution contracts at a near-minimax optimal rate in a high-dimensional sparse asymptotic regime.

[1]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[2]  Jared S. Murray,et al.  Bayesian Additive Regression Trees: A Review and Look Forward , 2020, Annual Review of Statistics and Its Application.

[3]  Purushottam W. Laud,et al.  Nonparametric survival analysis using Bayesian Additive Regression Trees (BART) , 2016, Statistics in medicine.

[4]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[5]  A. Linero Bayesian Regression Trees for High-Dimensional Prediction and Variable Selection , 2018 .

[6]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[7]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[8]  David B Dunson,et al.  Bayesian nonparametric hierarchical modeling. , 2009, Biometrical journal. Biometrische Zeitschrift.

[9]  Jared S. Murray,et al.  Log-Linear Bayesian Additive Regression Trees for Multinomial Logistic and Count Regression Models , 2017, Journal of the American Statistical Association.

[10]  M. May Bayesian Survival Analysis. , 2002 .

[11]  D.,et al.  Regression Models and Life-Tables , 2022 .

[12]  Jared S. Murray,et al.  Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion) , 2020, 2108.02836.

[13]  Thomas A Louis,et al.  Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. , 2017, Biostatistics.

[14]  Yun Yang,et al.  Bayesian regression tree ensembles that adapt to smoothness and sparsity , 2017, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[15]  Jared S. Murray,et al.  Adaptive Conditional Distribution Estimation with Bayesian Decision Tree Ensembles , 2020 .

[16]  Katharina Burger,et al.  Counting Processes And Survival Analysis , 2016 .

[17]  C. Holmes,et al.  Bayesian auxiliary variable models for binary and multinomial regression , 2006 .

[18]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[19]  A. V. D. Vaart,et al.  Convergence rates of posterior distributions , 2000 .

[20]  Antonio R. Linero,et al.  A review of tree-based Bayesian methods , 2017 .

[21]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[22]  Michael R Kosorok,et al.  Recursively Imputed Survival Trees , 2012, Journal of the American Statistical Association.

[23]  Antonio R. Linero,et al.  Semiparametric analysis of clustered interval-censored survival data using Soft Bayesian Additive Regression Trees (SBART) , 2020 .

[24]  Wesley O Johnson,et al.  Bayesian Nonparametric Nonproportional Hazards Survival Modeling , 2009, Biometrics.

[25]  Yee Whye Teh,et al.  Gaussian Processes for Survival Analysis , 2016, NIPS.

[26]  Hans Knutsson,et al.  Reinforcement Learning Trees , 1996 .

[27]  D. Oakes,et al.  Bivariate survival models induced by frailties , 1989 .

[28]  Jared S. Murray,et al.  Model Interpretation Through Lower-Dimensional Posterior Summarization , 2019, J. Comput. Graph. Stat..

[29]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[30]  Ryan P. Adams,et al.  Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities , 2009, ICML '09.

[31]  Haavard Rue,et al.  A principled distance-based prior for the shape of the Weibull model , 2020, Statistics & Probability Letters.

[32]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[33]  James R. Kenyon,et al.  Analysis of Multivariate Survival Data , 2002, Technometrics.

[34]  Aad van der Vaart,et al.  Fundamentals of Nonparametric Bayesian Inference , 2017 .

[35]  James G. Scott,et al.  BART with targeted smoothing: An analysis of patient-specific stillbirth risk , 2018, The Annals of Applied Statistics.

[36]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[37]  Kim-Anh Do,et al.  Bayesian ensemble methods for survival prediction in gene expression data , 2011, Bioinform..

[38]  S. Chib,et al.  Bayesian analysis of binary and polychotomous response data , 1993 .

[39]  Riten Mitra,et al.  Bayesian Nonparametric Inference - Why and How. , 2013, Bayesian analysis.

[40]  Joseph G. Ibrahim,et al.  A Bayesian justification of Cox's partial likelihood , 2003 .

[41]  Ethem Alpaydin,et al.  Soft decision trees , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).