BoXHED 2.0: Scalable boosting of functional data in survival analysis

Modern applications of survival analysis increasingly involve time-dependent covariates, which constitute a form of functional data. Learning from functional data generally involves repeated evaluations of time integrals which is numerically expensive. In this work we propose a lightweight data preprocessing step that transforms functional data into nonfunctional data. Boosting implementations for nonfunctional data can then be used, whereby the required numerical integration comes for free as part of the training phase. We use this to develop BoXHED2.0, a quantum leap over the treeboosted hazard package BoXHED1.0 [1]. BoXHED2.0 extends BoXHED1.0 to Aalen’s multiplicative intensity model, which covers censoring schemes far beyond right-censoring and also supports recurrent events data. It is also massively scalable because of preprocessing and also because it borrows from the core components of XGBoost [2]. BoXHED2.0 supports the use of GPUs and multicore CPUs, and is available from GitHub: www.github.com/BoXHED.