Identifying Important Risk Factors for Survival in Patient With Systolic Heart Failure Using Random Survival Forests

Background— Heart failure survival models typically are constructed using Cox proportional hazards regression. Regression modeling suffers from a number of limitations, including bias introduced by commonly used variable selection methods. We illustrate the value of an intuitive, robust approach to variable selection, random survival forests (RSF), in a large clinical cohort. RSF are a potentially powerful extensions of classification and regression trees, with lower variance and bias. Methods and Results— We studied 2231 adult patients with systolic heart failure who underwent cardiopulmonary stress testing. During a mean follow-up of 5 years, 742 patients died. Thirty-nine demographic, cardiac and noncardiac comorbidity, and stress testing variables were analyzed as potential predictors of all-cause mortality. An RSF of 2000 trees was constructed, with each tree constructed on a bootstrap sample from the original cohort. The most predictive variables were defined as those near the tree trunks (averaged over the forest). The RSF identified peak oxygen consumption, serum urea nitrogen, and treadmill exercise time as the 3 most important predictors of survival. The RSF predicted survival similarly to a conventional Cox proportional hazards model (out-of-bag C-index of 0.705 for RSF versus 0.698 for Cox proportional hazards model). Conclusions— An RSF model in a cohort of patients with heart failure performed as well as a traditional Cox proportional hazard model and may serve as a more intuitive approach for clinicians to identify important risk factors for all-cause mortality.

[1]  U. Siebert,et al.  The Munich score: a clinical index to predict survival in ambulatory patients with chronic heart failure in the era of new medical therapies. , 2004, The Journal of heart and lung transplantation : the official publication of the International Society for Heart Transplantation.

[2]  M. H. Gault,et al.  Prediction of creatinine clearance from serum creatinine. , 1975, Nephron.

[3]  A. LaCroix,et al.  Electrocardiographic Abnormalities That Predict Coronary Heart Disease Events and Mortality in Postmenopausal Women: The Women’s Health Initiative , 2006, Circulation.

[4]  D. Levy,et al.  Multiple biomarkers for the prediction of first major cardiovascular events and death. , 2006, The New England journal of medicine.

[5]  Mark R. Segal,et al.  Regression Trees for Censored Data , 1988 .

[6]  R. Starling,et al.  Prognostic evaluation of ambulatory patients with advanced heart failure. , 2008, The American journal of cardiology.

[7]  Salim Yusuf,et al.  A multivariate model for predicting mortality in patients with heart failure and systolic dysfunction. , 2004, The American journal of medicine.

[8]  W. Kübler,et al.  Risk stratification in middle‐aged patients with congestive heart failure: prospective comparison of the Heart Failure Survival Score (HFSS) and a simplified two‐variable model , 2001, European journal of heart failure.

[9]  Hans C van Houwelingen,et al.  Analysis of multiple SNPs in genetic association studies: comparison of three multi‐locus methods to prioritize and select SNPs , 2007, Genetic epidemiology.

[10]  H. Ishwaran,et al.  A novel approach to cancer staging: application to esophageal cancer. , 2009, Biostatistics.

[11]  Sinisa Pajevic,et al.  Short-term prediction of mortality in patients with systemic lupus erythematosus: classification of outcomes using random forests. , 2006, Arthritis and rheumatism.

[12]  Udaya B. Kogalur,et al.  High-Dimensional Variable Selection for Survival Data , 2010 .

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[15]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[16]  Peter C Austin,et al.  Logistic regression had superior performance compared with regression trees for predicting in-hospital mortality in patients hospitalized with heart failure. , 2010, Journal of clinical epidemiology.

[17]  L. Breiman Heuristics of instability and stabilization in model selection , 1996 .

[18]  Hemant Ishwaran,et al.  Importance of Treadmill Exercise Time as an Initial Prognostic Screening Tool in Patients With Systolic Left Ventricular Dysfunction , 2009, Circulation.

[19]  J. Schwartz,et al.  Development and prospective validation of a clinical index to predict survival in ambulatory patients referred for cardiac transplant evaluation. , 1996, Circulation.

[20]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[21]  G. W. Snedecor Statistical Methods , 1964 .

[22]  K. Lunetta,et al.  Screening large-scale association study data: exploiting interactions using random forests , 2004, BMC Genetics.

[23]  D. Mozaffarian,et al.  The Seattle Heart Failure Model: Prediction of Survival in Heart Failure , 2006, Circulation.

[24]  H. Ishwaran,et al.  Optimum Lymphadenectomy for Esophageal Cancer Methods Worldwide Esophageal Cancer Collaboration Multivariable Risk-adjusted Survival , 2022 .

[25]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[26]  Ziv Bar-Joseph,et al.  Evaluation of different biological data and computational classification methods for use in protein interaction prediction , 2006, Proteins.

[27]  Sharon Adams,et al.  Immunogenetic risk and protective factors for juvenile dermatomyositis in Caucasians. , 2006, Arthritis and rheumatism.

[28]  Robert E. Schapire,et al.  The Boosting Approach to Machine Learning An Overview , 2003 .

[29]  A Camerini,et al.  [Heart rate recovery and treadmill exercise score as predictors of mortality in patients referred for exercise ECG]. , 2001, Italian heart journal. Supplement : official journal of the Italian Federation of Cardiology.

[30]  Hemant Ishwaran,et al.  An interferon-related gene signature for DNA damage resistance is a predictive marker for chemotherapy and radiation for breast cancer , 2008, Proceedings of the National Academy of Sciences.