Predicting breast cancer recurrence using effective classification and feature selection technique

Breast cancer is a major threat for middle aged women throughout the world and currently this is the second most threatening cause of cancer death in women. But early detection and prevention can significantly reduce the chances of death. An important fact regarding breast cancer prognosis is to optimize the probability of cancer recurrence. This paper aims at finding breast cancer recurrence probability using different data mining techniques. We also provide a noble approach in order to improve the accuracy of those models. Cancer patient's data were collected from Wisconsin dataset of UCI machine learning Repository. This dataset contained total 35 attributes in which we applied Naive Bayes, C4.5 Decision Tree and Support Vector Machine (SVM) classification algorithms and calculated their prediction accuracy. An efficient feature selection algorithm helped us to improve the accuracy of each model by reducing some lower ranked attributes. Not only the contributions of these attributes are very less, but their addition also misguides the classification algorithms. After a careful selection of upper ranked attributes we found a much improved accuracy rate for all three algorithms.

[1]  Pan Wen Application of decision tree to identify abnormal high frequency electro-cardiograph , 2009 .

[2]  Saurabh Pal,et al.  A Novel Approach for Breast Cancer Detection Using Data Mining Techniques , 2017 .

[3]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Jianping Zhang,et al.  Selecting Typical Instances in Instance-Based Learning , 1992, ML.

[5]  Dong-Sheng Cao,et al.  Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity , 2010 .

[6]  K. Ramesh Kumar,et al.  Analysis of Feature Selection Algorithms on Classification: A Survey , 2014 .

[7]  Erhan Guven,et al.  PREDICTING BREAST CANCER SURVIVABILITY USING DATA MINING TECHNIQUES , 2006 .

[8]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[10]  Jigar Patel,et al.  Diagnosis of Breast Cancer using Clustering Data Mining Approach , 2014 .

[11]  Cheng Wang,et al.  Decision Tree Based Predictive Models for Breast Cancer Survivability on Imbalanced Data , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[12]  A. A. Safavi,et al.  Predicting breast cancer survivability using data mining techniques , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[13]  Hyunjung Shin,et al.  Predicting breast cancer survivability using fuzzy decision trees for personalized healthcare. , 2008, Conference proceedings : ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual Conference.

[14]  Abdel-Badeeh M. Salem,et al.  Using data mining for assessing diagnosis of breast cancer , 2010, Proceedings of the International Multiconference on Computer Science and Information Technology.