OBLIQUE RANDOM SURVIVAL FORESTS.

We introduce and evaluate the oblique random survival forest (ORSF). The ORSF is an ensemble method for right-censored survival data that uses linear combinations of input variables to recursively partition a set of training data. Regularized Cox proportional hazard models are used to identify linear combinations of input variables in each recursive partitioning step. Benchmark results using simulated and real data indicate that the ORSF’s predicted risk function has high prognostic value in comparison to random survival forests, conditional inference forests, regression, and boosting. In an application to data from the Jackson Heart Study, we demonstrate variable and partial dependence using the ORSF and highlight characteristics of its 10-year predicted risk function for atherosclerotic cardiovascular disease events (ASCVD; stroke, coronary heart disease). We present visualizations comparing variable and partial effect estimation according to the ORSF, the conditional inference forest, and the Pooled Cohort Risk equations. The obliqueRSF R package, which provides functions to fit the ORSF and create variable and partial dependence plots, is available on the comprehensive R archive network (CRAN).

[1]  L. Esserman,et al.  A genomic predictor of response and survival following taxane-anthracycline chemotherapy for invasive breast cancer. , 2011, JAMA.

[2]  Thomas A Gerds,et al.  Estimating a time‐dependent concordance index for survival prediction models with covariate dependent censoring , 2013, Statistics in medicine.

[3]  Daniel W. Jones,et al.  Toward resolution of cardiovascular health disparities in African Americans: design and methods of the Jackson Heart Study. , 2005, Ethnicity & disease.

[4]  L. V. van't Veer,et al.  Cross‐validated Cox regression on microarray gene expression data , 2006, Statistics in medicine.

[5]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[6]  Donglin Zeng,et al.  Reinforcement Learning Trees , 2015, Journal of the American Statistical Association.

[7]  Michael J Crowther,et al.  Using simulation studies to evaluate statistical methods , 2017, Statistics in medicine.

[8]  J. Klein,et al.  Statistical Models Based On Counting Process , 1994 .

[9]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[10]  C. Moy,et al.  The Reasons for Geographic and Racial Differences in Stroke Study: Objectives and Design , 2005, Neuroepidemiology.

[11]  D. Harrington,et al.  Counting Processes and Survival Analysis , 1991 .

[12]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[13]  E Graf,et al.  Assessment and comparison of prognostic classification schemes for survival data. , 1999, Statistics in medicine.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  D.,et al.  Regression Models and Life-Tables , 2022 .

[16]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[17]  Yuan Qi,et al.  Multifactorial approach to predicting resistance to anthracyclines. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[18]  Trevor Hastie,et al.  The elements of statistical learning. 2001 , 2001 .

[19]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[20]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[21]  Ruoqing Zhu Tree-based methods for survival analysis and high-dimensional data , 2013 .

[22]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[23]  Federico Rotolo,et al.  Identification of biomarker‐by‐treatment interactions in randomized clinical trials with survival outcomes and high‐dimensional spaces , 2016, Biometrical journal. Biometrische Zeitschrift.

[24]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[25]  Torsten Hothorn,et al.  Double-Bagging: Combining Classifiers by Bootstrap Aggregation , 2002, Pattern Recognit..

[26]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[27]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[28]  T. Lumley,et al.  Time‐Dependent ROC Curves for Censored Survival Data and a Diagnostic Marker , 2000, Biometrics.

[29]  Gerhard Tutz,et al.  Boosting ridge regression , 2007, Comput. Stat. Data Anal..

[30]  P. Heagerty,et al.  Survival Model Predictive Accuracy and ROC Curves , 2005, Biometrics.

[31]  Thomas A Gerds,et al.  The c‐index is not proper for the evaluation of t‐year predicted risks , 2019, Biostatistics.

[32]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[33]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[34]  Scott M. Lundberg,et al.  Consistent Individualized Feature Attribution for Tree Ensembles , 2018, ArXiv.

[35]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[36]  Alexander Kowarik,et al.  Imputation with the R Package VIM , 2016 .

[37]  K. Hornik,et al.  party : A Laboratory for Recursive Partytioning , 2009 .

[38]  Denis Larocque,et al.  A review of survival trees , 2011 .

[39]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[40]  Wilbert S Aronow,et al.  2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA Guideline for the Prevention, Detection, Evaluation, and Management of High Blood Pressure in Adults: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. , 2018, Hypertension.

[41]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[42]  P. Grambsch,et al.  A Package for Survival Analysis in S , 1994 .

[43]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[44]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .