A Novel Approach for Breast Cancer Detection Using Data Mining Techniques

Breast cancer is one of the leading cancers for women when compared to all other cancers. It is the second most common cause of cancer death in women. Breast cancer risk in India revealed that 1 in 28 women develop breast cancer during her lifetime. This is higher in urban areas being 1 in 22 in a lifetime compared to rural areas where this risk is relatively much lower being 1 in 60 women developing breast cancer in their lifetime. In India the average age of the high risk group is 43-46 years unlike in the west where women aged 53-57 years are more prone to breast cancer. The aim of this paper is to investigate the performance of different classification techniques. The data breast cancer data with a total 683 rows and 10 columns will be used to test, by using classification accuracy. We analyse the breast Cancer data available from the Wisconsin dataset from UCI machine learning with the aim of developing accurate prediction models for breast cancer using data mining techniques. In this experiment, we compare three classification techniques in Weka software and comparison results show that Sequential Minimal Optimization (SMO) has higher prediction accuracy i.e. 96.2% than IBK and BF Tree methods.

[1]  Saurabh Pal,et al.  Early Prediction of Heart Diseases Using Data Mining Techniques , 2013 .

[2]  Saurabh Pal,et al.  Data Mining Approach to Detect Heart Diseases , 2014 .

[3]  Dongkyoo Shin,et al.  Effective Diagnosis of Heart Disease through Bagging Approach , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[4]  K.S. Nikita,et al.  Classification of medical data with a robust multi-level combination scheme , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[5]  S. Pal,et al.  Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability , 2017 .

[6]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[7]  Haijia Shi Best-first Decision Tree Learning , 2007 .

[8]  Cheng Wang,et al.  Decision Tree Based Predictive Models for Breast Cancer Survivability on Imbalanced Data , 2009, 2009 3rd International Conference on Bioinformatics and Biomedical Engineering.

[9]  Dongkyoo Shin,et al.  A Comparative Study of Medical Data Classification Methods Based on Decision Tree and Bagging Algorithms , 2009, 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing.

[10]  A. A. Safavi,et al.  Predicting breast cancer survivability using data mining techniques , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[11]  C. Kaewchinporn,et al.  A combination of decision tree learning and clustering for data classification , 2011, 2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE).

[12]  A. Cuschieri,et al.  Artificial Neural Networks in Cancer Management , 2004 .

[13]  D. S. Guru,et al.  Representation and Classification of Text Documents: A Brief Review , 2010 .

[14]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[15]  Aik Choon Tan,et al.  Ensemble machine learning on gene expression data for cancer classification. , 2003, Applied bioinformatics.

[16]  Dong-Sheng Cao,et al.  Automatic feature subset selection for decision tree-based ensemble methods in the prediction of bioactivity , 2010 .

[17]  Philip S. Yu,et al.  Data Mining: An Overview from a Database Perspective , 1996, IEEE Trans. Knowl. Data Eng..

[18]  Huiqing Liu,et al.  Discovery of significant rules for classifying cancer diagnosis data , 2003, ECCB.