Analysis of Breast Cancer Detection Using Different Machine Learning Techniques

Data mining algorithms play an important role in the prediction of early-stage breast cancer. In this paper, we propose an approach that improves the accuracy and enhances the performance of three different classifiers: Decision Tree (J48), Naïve Bayes (NB), and Sequential Minimal Optimization (SMO). We also validate and compare the classifiers on two benchmark datasets: Wisconsin Breast Cancer (WBC) and Breast Cancer dataset. Data with imbalanced classes are a big problem in the classification phase since the probability of instances belonging to the majority class is significantly high, the algorithms are much more likely to classify new observations to the majority class. We address such problem in this work. We use the data level approach which consists of resampling the data in order to mitigate the effect caused by class imbalance. For evaluation, 10 fold cross-validation is performed. The efficiency of each classifier is assessed in terms of true positive, false positive, Roc curve, standard deviation (Std), and accuracy (AC). Experiments show that using a resample filter enhances the classifier’s performance where SMO outperforms others in the WBC dataset and J48 is superior to others in the Breast Cancer dataset.

[1]  Amit Gupta,et al.  Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms , 2016 .

[2]  Jesús Silva,et al.  Integration of Data Mining Classification Techniques and Ensemble Learning for Predicting the Type of Breast Cancer Recurrence , 2019, GPC.

[3]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[4]  Hesham Alghodhaifi,et al.  Predicting Invasive Ductal Carcinoma in breast histology images using Convolutional Neural Network , 2019, 2019 IEEE National Aerospace and Electronics Conference (NAECON).

[5]  G. I. Salama,et al.  Experimental comparison of classifiers for breast cancer diagnosis , 2012, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES).

[6]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[7]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8]  K. Usha Rani,et al.  ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS , 2011 .

[9]  Azuraliza Abu Bakar,et al.  COMPARATIVE STUDY ON DIFFERENT CLASSIFICATION TECHNIQUES FOR BREAST CANCER DATASET , 2014 .

[10]  Belgin Ergenc,et al.  Vertical Pattern Mining Algorithm for Multiple Support Thresholds , 2017, KES.

[11]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[12]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[13]  Shahed Anzarus Sabab,et al.  Predicting breast cancer recurrence using effective classification and feature selection technique , 2016, 2016 19th International Conference on Computer and Information Technology (ICCIT).

[14]  Savita Goel,et al.  A study on prediction of breast cancer recurrence using data mining techniques , 2017, 2017 7th International Conference on Cloud Computing, Data Science & Engineering - Confluence.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[17]  Saurabh Pal,et al.  A Novel Approach for Breast Cancer Detection Using Data Mining Techniques , 2017 .