BoXHED: Boosted eXact Hazard Estimator with Dynamic covariates

The proliferation of medical monitoring devices makes it possible to track health vitals at high frequency, enabling the development of dynamic health risk scores that change with the underlying readings. Survival analysis, in particular hazard estimation, is well-suited to analyzing this stream of data to predict disease onset as a function of the time-varying vitals. This paper introduces the software package BoXHED (pronounced 'box-head') for nonparametrically estimating hazard functions via gradient boosting. BoXHED 1.0 is a novel tree-based implementation of the generic estimator proposed in Lee, Chen, Ishwaran (2017), which was designed for handling time-dependent covariates in a fully nonparametric manner. BoXHED is also the first publicly available software implementation for Lee, Chen, Ishwaran (2017). Applying BoXHED to cardiovascular disease onset data from the Framingham Heart Study reveals novel interaction effects among known risk factors, potentially resolving an open question in clinical literature.

[1]  Junchao Ma,et al.  Using the Shapes of Clinical Data Trajectories to Predict Mortality in ICUs , 2019, Critical care explorations.

[2]  Harald Binder,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[3]  Lei Zheng,et al.  Deep Recurrent Survival Analysis , 2018, AAAI.

[4]  M. Landray,et al.  Evidence for Reverse Causality in the Association Between Blood Pressure and Cardiovascular Risk in Patients With Chronic Kidney Disease , 2017, Hypertension.

[5]  Ralph B D'Agostino,et al.  Framingham risk score and prediction of lifetime risk for coronary heart disease. , 2004, The American journal of cardiology.

[6]  Thomas A Gerds,et al.  The c‐index is not proper for the evaluation of t‐year predicted risks , 2019, Biostatistics.

[7]  María Dolores Martínez Miranda,et al.  Bandwidth selection in marker dependent kernel hazard estimation , 2013, Comput. Stat. Data Anal..

[8]  Peter Buhlmann,et al.  BOOSTING ALGORITHMS: REGULARIZATION, PREDICTION AND MODEL FITTING , 2007, 0804.2752.

[9]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[10]  Junchao Ma,et al.  Development of Imminent Mortality Predictor for Advanced Cancer (IMPAC), a Tool to Predict Short-Term Mortality in Hospitalized Patients With Advanced Cancer. , 2017, Journal of oncology practice.

[11]  Mihaela van der Schaar,et al.  MATCH-Net: Dynamic Prediction in Survival Analysis using Convolutional Neural Networks , 2018, ArXiv.

[12]  Ahmed M. Alaa,et al.  Temporal Quilting for Survival Analysis , 2019, AISTATS.

[13]  O. Linton,et al.  Kernel estimation in a nonparametric marker dependent hazard model , 1995 .

[14]  Gretchen A. Stevens,et al.  A novel risk score to predict cardiovascular disease risk in national populations (Globorisk): a pooled analysis of prospective cohorts and health examination surveys. , 2015, The lancet. Diabetes & endocrinology.

[15]  Mihaela van der Schaar,et al.  Boosted Trees for Risk Prognosis , 2018, MLHC.

[16]  Adler J. Perotte,et al.  Deep Survival Analysis , 2016, MLHC.

[17]  T. Dawber,et al.  Epidemiological approaches to heart disease: the Framingham Study. , 1951, American journal of public health and the nation's health.

[18]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[19]  N. Wong,et al.  Hypertension and cardiovascular disease: contributions of the framingham heart study. , 2013, Global heart.

[20]  Hemant Ishwaran,et al.  Boosted Nonparametric Hazards with Time-Dependent Covariates , 2017, Annals of statistics.

[21]  Mihaela van der Schaar,et al.  Boosting Transfer Learning with Survival Data from Heterogeneous Domains , 2019, AISTATS.

[22]  G. Ridgeway The State of Boosting ∗ , 1999 .

[23]  A. Khera,et al.  2019 ACC/AHA Guideline on the Primary Prevention of Cardiovascular Disease. , 2019, Circulation.

[24]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.