Predicting breast cancer survivability using random forest and multivariate adaptive regression splines

In this paper, we propose a hybrid of random forest and multivariate adaptive regression splines algorithms for building a breast cancer survivability prediction model. We use random forest to perform a preliminary screening of variables and to receive a importance ranks. Then, the new dataset is extracted from initial WDBC dataset according to top-k important predictors and is input into the MARS procedure, which is responsible for building interpretable models for predicting breast cancer survivability. The capability of this combination method is evaluated using basic performance measurements (e.g., accuracy, sensitivity, and specificity) along with a 10-fold cross-validation. Experimental results show that the proposed method provides a higher accuracy and a relatively simple model.

[1]  P. H. Sönksen,et al.  Data mining for indicators of early mortality in a database of clinical records , 2001, Artif. Intell. Medicine.

[2]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[3]  A. A. Safavi,et al.  Predicting breast cancer survivability using data mining techniques , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[4]  Paulo J. G. Lisboa,et al.  Orthogonal search-based rule extraction (OSRE) for trained neural networks: a practical and efficient approach , 2006, IEEE Transactions on Neural Networks.

[5]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[6]  Tulay Yildirim,et al.  BREAST CANCER DIAGNOSIS USING STATISTICAL NEURAL NETWORKS , 2004 .

[7]  Jerome H. Friedman Multivariate adaptive regression splines (with discussion) , 1991 .

[8]  Yi Wang,et al.  Breast Cancer Diagnosis via Supp ort Vector Machines , 2006, 2006 Chinese Control Conference.

[9]  Yanchun Zhang,et al.  AdaBoost algorithm with random forests for predicting breast cancer survivability , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[10]  J. Friedman Multivariate adaptive regression splines , 1990 .

[11]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[12]  T. Srinivasan,et al.  Knowledge discovery in clinical databases with neural network evidence combination , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[13]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.