Survival Prediction and Feature Selection in Patients with Breast Cancer Using Support Vector Regression

The Support Vector Regression (SVR) model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC) dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.

[1]  A. Dreher Modeling Survival Data Extending The Cox Model , 2016 .

[2]  Vladimir Cherkassky,et al.  SVM-Based Approaches for Predictive Modeling of Survival Data , 2013 .

[3]  Farookh Khadeer Hussain,et al.  Support vector regression with chaos-based firefly algorithm for stock market price forecasting , 2013, Appl. Soft Comput..

[4]  Vincenzo Lagani,et al.  Structure-based variable selection for survival data , 2010, Bioinform..

[5]  B. Pradhan,et al.  Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models , 2012 .

[6]  Gaëtan MacGrogan,et al.  Variables with time-varying effects and the Cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer , 2010, BMC medical research methodology.

[7]  Axel Benner,et al.  Elastic SCAD as a novel penalization method for SVM classification tasks in high-dimensional data , 2011, BMC Bioinformatics.

[8]  Jian-Bo Yang,et al.  Feature selection for support vector regression using probabilistic prediction , 2010, KDD.

[9]  Juha Reunanen,et al.  Overfitting in Making Comparisons Between Variable Selection Methods , 2003, J. Mach. Learn. Res..

[10]  Hossein Mahjub,et al.  Survival analysis of breast cancer patients using Cox and frailty models. , 2012, Journal of research in health sciences.

[11]  T R Fleming,et al.  Levamisole and fluorouracil for adjuvant therapy of resected colon carcinoma. , 1990, The New England journal of medicine.

[12]  S Van Huffel,et al.  Additive survival least‐squares support vector machines , 2010, Statistics in medicine.

[13]  Fernando De la Torre,et al.  Optimal feature selection for support vector machines , 2010, Pattern Recognit..

[14]  Wei Chu,et al.  A Support Vector Approach to Censored Targets , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  Klaus Obermayer,et al.  Nonlinear Feature Selection with the Potential Support Vector Machine , 2006, Feature Extraction.

[16]  Sabine Van Huffel,et al.  On the use of a clinical kernel in survival analysis , 2010, ESANN.

[17]  Gang Kou,et al.  Feature Selection for Nonlinear Kernel Support Vector Machines , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[18]  Javad Faradmal,et al.  Comparison of three adjuvant chemotherapy regimes using an extended log-logistic model in women with operable breast cancer. , 2010, Asian Pacific journal of cancer prevention : APJCP.

[19]  Oliver Hartmann,et al.  Time-dependent Cox regression: serial measurement of the cardiovascular biomarker proadrenomedullin improves survival prediction in patients with lower respiratory tract infection. , 2012, International journal of cardiology.

[20]  I. Langner Survival Analysis: Techniques for Censored and Truncated Data , 2006 .

[21]  M. Bredel,et al.  Feature selection and survival modeling in The Cancer Genome Atlas , 2013, International journal of nanomedicine.

[22]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[23]  Antonio J. Serrano,et al.  Profiled support vector machines for antisense oligonucleotide efficacy prediction , 2004, BMC Bioinformatics.

[24]  Javad Faradmal,et al.  Comparison of the performance of log-logistic regression and artificial neural networks for predicting breast cancer relapse. , 2014, Asian Pacific journal of cancer prevention : APJCP.

[25]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[26]  Sabine Van Huffel,et al.  Support vector methods for survival analysis: a comparison between ranking and regression approaches , 2011, Artif. Intell. Medicine.

[27]  Hossein Mahjub,et al.  Performance Evaluation of Support Vector Regression Models for Survival Analysis: A Simulation Study , 2016 .

[28]  Jack Y. Yang,et al.  Combining support vector regression with feature selection for multivariate calibration , 2009, Neural Computing and Applications.

[29]  Sabine Van Huffel,et al.  Learning Transformation Models for Ranking and Survival Analysis , 2011, J. Mach. Learn. Res..

[30]  Michael W. Kattan,et al.  An empirical approach to model selection through validation for censored survival data , 2011, J. Biomed. Informatics.

[31]  Chien-Feng Huang,et al.  A hybrid stock selection model using genetic algorithms and support vector regression , 2012, Appl. Soft Comput..

[32]  Erhan Bilal,et al.  Improving Breast Cancer Survival Analysis through Competition-Based Multidimensional Modeling , 2013, PLoS Comput. Biol..

[33]  Sumeet Dua,et al.  Cancer prognosis using support vector regression in imaging modality. , 2011, World journal of clinical oncology.

[34]  Faisal M. Khan,et al.  Support Vector Regression for Censored Data (SVRc): A Novel Tool for Survival Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[35]  Roberto Tagliaferri,et al.  Artificial neural network analysis of circulating tumor cells in metastatic breast cancer patients , 2011, Breast Cancer Research and Treatment.

[36]  Frederic Magoules,et al.  Feature selection for support vector regression in the application of building energy prediction , 2011, 2011 IEEE 9th International Symposium on Applied Machine Intelligence and Informatics (SAMI).

[37]  Ali Delpisheh,et al.  Predictive factors of survival time of breast cancer in kurdistan province of Iran between 2006-2014: a cox regression approach. , 2014, Asian Pacific Journal of Cancer Prevention.

[38]  Chih-Jen Lin,et al.  Feature Ranking Using Linear SVM , 2008, WCCI Causation and Prediction Challenge.

[39]  Chao-Ton Su,et al.  Feature selection for the SVM: An application to hypertension diagnosis , 2008, Expert Syst. Appl..

[40]  Li Sheng,et al.  Efficient support vector machine method for survival prediction with SEER data. , 2010, Advances in experimental medicine and biology.

[41]  Sabine Van Huffel,et al.  Improved performance on high-dimensional survival data by application of Survival-SVM , 2011, Bioinform..

[42]  ZhongXin Ding The application of support vector machine in survival analysis , 2011, 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce (AIMSEC).