Random rotation survival forest for high dimensional censored data

Recently, rotation forest has been extended to regression and survival analysis problems. However, due to intensive computation incurred by principal component analysis, rotation forest often fails when high-dimensional or big data are confronted. In this study, we extend rotation forest to high dimensional censored time-to-event data analysis by combing random subspace, bagging and rotation forest. Supported by proper statistical analysis, we show that the proposed method random rotation survival forest outperforms state-of-the-art survival ensembles such as random survival forest and popular regularized Cox models.

[1]  Udaya B. Kogalur,et al.  High-Dimensional Variable Selection for Survival Data , 2010 .

[2]  P. Bühlmann,et al.  Survival ensembles. , 2006, Biostatistics.

[3]  Xi Chen,et al.  Random survival forests for high‐dimensional data , 2011, Stat. Anal. Data Min..

[4]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[5]  David R. Cox,et al.  Regression models and life tables (with discussion , 1972 .

[6]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[7]  D. Cox,et al.  Analysis of Survival Data. , 1985 .

[8]  Juan José Rodríguez Diez,et al.  An Experimental Study on Rotation Forest Ensembles , 2007, MCS.

[9]  Hongzhe Li,et al.  Boosting proportional hazards models using smoothing splines, with applications to high-dimensional microarray data , 2005, Bioinform..

[10]  Hongzhe Li,et al.  Dimension reduction methods for microarrays with application to censored survival data , 2004, Bioinform..

[11]  Denis Larocque,et al.  A review of survival trees , 2011 .

[12]  Harald Binder,et al.  Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models , 2008, BMC Bioinformatics.

[13]  Jian Huang,et al.  Regularized Estimation in the Accelerated Failure Time Model with High‐Dimensional Covariates , 2006, Biometrics.

[14]  Trevor Hastie,et al.  Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent. , 2011, Journal of statistical software.

[15]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jianqing Fan,et al.  Variable Selection for Cox's proportional Hazards Model and Frailty Model , 2002 .

[17]  Hai Fang,et al.  The ‘dnet’ approach promotes emerging research on cancer patient survival , 2014, Genome Medicine.

[18]  Harald Binder,et al.  Boosting for high-dimensional time-to-event data with competing risks , 2009, Bioinform..

[19]  M LeBlanc,et al.  A review of tree-based prognostic models. , 1995, Cancer treatment and research.

[20]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  D Faraggi,et al.  A neural network model for survival data. , 1995, Statistics in medicine.

[23]  C. Wang,et al.  Statistical Applications in Genetics and Molecular Biology Buckley-James Boosting for Survival Analysis with High-Dimensional Biomarker Data , 2011 .

[24]  Jian Huang,et al.  Clustering threshold gradient descent regularization: with applications to microarray studies , 2007, Bioinform..

[25]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[26]  P. Hall,et al.  An expression signature for p 53 status in human breast cancer predicts mutation status , transcriptional effects , and patient survival , 2005 .

[27]  G. Ridgeway The State of Boosting ∗ , 1999 .

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  J. Cavanaugh Biostatistics , 2005, Definitions.

[30]  Torsten Hothorn,et al.  Bagging survival trees , 2002, Statistics in medicine.

[31]  P. Hall,et al.  An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[32]  Josef Kittler,et al.  Multiple Classifier Systems , 2004, Lecture Notes in Computer Science.

[33]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[34]  Daniel B. Mark,et al.  TUTORIAL IN BIOSTATISTICS MULTIVARIABLE PROGNOSTIC MODELS: ISSUES IN DEVELOPING MODELS, EVALUATING ASSUMPTIONS AND ADEQUACY, AND MEASURING AND REDUCING ERRORS , 1996 .

[35]  Hong Wang,et al.  Rotation survival forest for right censored data , 2015, PeerJ.

[36]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  H. Zou,et al.  A cocktail algorithm for solving the elastic net penalized Cox’s regression in high dimensions , 2013 .

[38]  H. Kölbl,et al.  The humoral immune system has a key prognostic impact in node-negative breast cancer. , 2008, Cancer research.

[39]  Torsten Hothorn,et al.  Model-based boosting in high dimensions , 2006, Bioinform..