Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection

BACKGROUND In clinical research, the primary interest is often the time until occurrence of an adverse event, i.e., survival analysis. Its application to electronic health records is challenging for two main reasons: (1) patient records are comprised of high-dimensional feature vectors, and (2) feature vectors are a mix of categorical and real-valued features, which implies varying statistical properties among features. To learn from high-dimensional data, researchers can choose from a wide range of methods in the fields of feature selection and feature extraction. Whereas feature selection is well studied, little work focused on utilizing feature extraction techniques for survival analysis. RESULTS We investigate how well feature extraction methods can deal with features having varying statistical properties. In particular, we consider multiview spectral embedding algorithms, which specifically have been developed for these situations. We propose to use random survival forests to accurately determine local neighborhood relations from right censored survival data. We evaluated 10 combinations of feature extraction methods and 6 survival models with and without intrinsic feature selection in the context of survival analysis on 3 clinical datasets. Our results demonstrate that for small sample sizes - less than 500 patients - models with built-in feature selection (Cox model with ℓ1 penalty, random survival forest, and gradient boosted models) outperform feature extraction methods by a median margin of 6.3% in concordance index (inter-quartile range: [-1.2%;14.6%]). CONCLUSIONS If the number of samples is insufficient, feature extraction methods are unable to reliably identify the underlying manifold, which makes them of limited use in these situations. For large sample sizes - in our experiments, 2500 samples or more - feature extraction methods perform as well as feature selection methods.

[1]  R Z Omar,et al.  An evaluation of penalised survival methods for developing prognostic models with rare events , 2012, Statistics in medicine.

[2]  Harald Binder,et al.  Assessment of survival prediction models based on microarray data , 2007, Bioinform..

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[5]  Yongdong Zhang,et al.  Multiview Spectral Embedding , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[6]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Jian Huang,et al.  BMC Bioinformatics BioMed Central Methodology article Supervised group Lasso with applications to microarray data , 2007 .

[9]  J. Shotton,et al.  Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning , 2011 .

[10]  Axel Benner,et al.  High‐Dimensional Cox Models: The Choice of Penalty as Part of the Model Building Process , 2010, Biometrical journal. Biometrische Zeitschrift.

[11]  Anne-Laure Boulesteix,et al.  Survival prediction using gene expression data: A review and comparison , 2009, Comput. Stat. Data Anal..

[12]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[13]  Paolo Bientinesi,et al.  On Parallelizing the MRRR Algorithm for Data-Parallel Coprocessors , 2009, PPAM.

[14]  Meng Wang,et al.  Optimizing multi-graph learning: towards a unified video annotation scheme , 2007, ACM Multimedia.

[15]  M. Kutcher,et al.  Coronary Artery Stents: II. Perioperative Considerations and Management , 2008, Anesthesia and analgesia.

[16]  Lawrence K. Saul,et al.  Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifold , 2003, J. Mach. Learn. Res..

[17]  R. Tibshirani The lasso method for variable selection in the Cox model. , 1997, Statistics in medicine.

[18]  Jörg Hausleiter,et al.  Prognostic value of sensitive troponin T in patients with stable and unstable angina and undetectable conventional troponin. , 2011, American heart journal.

[19]  Peter Lindstrom,et al.  Locally-scaled spectral clustering using empty region graphs , 2012, KDD.

[20]  Harald Binder,et al.  Sparse regression techniques in low-dimensional survival data settings , 2010, Stat. Comput..

[21]  P. J. Verweij,et al.  Penalized likelihood in Cox regression. , 1994, Statistics in medicine.

[22]  Yoshua Bengio,et al.  Exploring Strategies for Training Deep Neural Networks , 2009, J. Mach. Learn. Res..

[23]  R. Tibshirani,et al.  "Preconditioning" for feature selection and regression in high-dimensional problems , 2007, math/0703858.

[24]  Yoshua Bengio,et al.  Non-Local Manifold Tangent Learning , 2004, NIPS.

[25]  F. Harrell,et al.  Evaluating the yield of medical tests. , 1982, JAMA.

[26]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[27]  Geoffrey E. Hinton,et al.  Global Coordination of Local Linear Models , 2001, NIPS.

[28]  Inderjit S. Dhillon,et al.  The design and implementation of the MRRR algorithm , 2006, TOMS.

[29]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[30]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[31]  Jiang Gui,et al.  Partial Cox regression analysis for high-dimensional microarray gene expression data , 2004, ISMB/ECCB.

[32]  P. Bühlmann,et al.  Boosting With the L2 Loss , 2003 .

[33]  Paolo Bientinesi,et al.  The Algorithm of Multiple Relatively Robust Representations for Multi-core Processors , 2010, PARA.

[34]  Sidong Liu,et al.  A supervised multiview spectral embedding method for neuroimaging classification , 2013, 2013 IEEE International Conference on Image Processing.

[35]  Anne-Laure Boulesteix,et al.  Investigating the prediction ability of survival models based on both clinical and omics data: two case studies , 2014, Statistics in medicine.

[36]  Mohsen Pourahmadi,et al.  Computing science and statistics : proceedings of the 31st Symposium on the Interface : models, predictions, and computing, Schaumburg, Illinois, June 9-12, 1999 , 1999 .

[37]  W. Kannel,et al.  An investigation of coronary heart disease in families. The Framingham offspring study. , 1979, American journal of epidemiology.

[38]  L. L. Doove,et al.  Recursive partitioning for missing data imputation in the presence of interaction effects , 2014, Comput. Stat. Data Anal..

[39]  J. Bergh,et al.  Strong Time Dependence of the 76-Gene Prognostic Signature for Node-Negative Breast Cancer Patients in the TRANSBIG Multicenter Independent Validation Series , 2007, Clinical Cancer Research.

[40]  Zheng-Jun Zha,et al.  Difficulty Guided Image Retrieval Using Linear Multiple Feature Embedding , 2012, IEEE Transactions on Multimedia.

[41]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[42]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[43]  Javier Rojo,et al.  Dimension Reduction of microarray Gene Expression Data: the Accelerated Failure Time Model , 2009, J. Bioinform. Comput. Biol..

[44]  Martin Dugas,et al.  Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data , 2010, BMC Bioinformatics.

[45]  J. Friedman Stochastic gradient boosting , 2002 .

[46]  Jing Zhao,et al.  Dimensionality Reduction with Random Indexing: An Application on Adverse Drug Event Detection Using Electronic Health Records , 2014, 2014 IEEE 27th International Symposium on Computer-Based Medical Systems.

[47]  Ke Zhou,et al.  Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[48]  Carlo Vercellis,et al.  A comparative study of nonlinear manifold learning methods for cancer microarray data classification , 2013, Expert Syst. Appl..

[49]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[50]  Nassir Navab,et al.  Fast Training of Support Vector Machines for Survival Analysis , 2015, ECML/PKDD.

[51]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.