Comparison of different machine learning methods on Wisconsin dataset

Breast cancer research has made a great progress in the recent years, but there is still a room for an improvement. Wisconsin Diagnosis Breast Cancer (WDBC) contains 569 patients records with 32 attributes extracted from the digitized images of a fine needle aspirate of a breast mass. We used this dataset to compare selected machine learning methods in the binary classification solution. We realized the whole analytical process in accordance with the CRISP-DM methodology representing one of the most used process models for this purpose. Finally, we compared our results with some of the previously published research papers to evaluate our approach and expectations. We achieved the best accuracy with SVM — 97.66%, Random Forests — 97.37% and C4.5 — 95.61%.

[1]  Hamid A. Jalab,et al.  A hybrid classification algorithm approach for breast cancer diagnosis , 2016, 2016 IEEE Industrial Electronics and Applications Conference (IEACon).

[2]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[5]  D. Hand,et al.  Idiot's Bayes—Not So Stupid After All? , 2001 .

[6]  Harry Zhang,et al.  The Optimality of Naive Bayes , 2004, FLAIRS.

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Marek Bundzel,et al.  IMPLEMENTATION OF INTELLIGENT SOFTWARE USING IBM WATSON AND BLUEMIX , 2017 .

[9]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[10]  Thomas Reinartz,et al.  CRISP-DM 1.0: Step-by-step data mining guide , 2000 .

[11]  Saurabh Pal,et al.  A Novel Approach for Breast Cancer Detection Using Data Mining Techniques , 2017 .

[12]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[13]  Michal Kvet,et al.  ALGORITHM FOR BRAIN TUMOUR DETECTIONS , 2012 .

[14]  Sreerama K. Murthy,et al.  Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey , 1998, Data Mining and Knowledge Discovery.

[15]  J. Padmavathi,et al.  A Comparative study on Breast Cancer Prediction Using RBF and MLP , 2011 .

[16]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[17]  N. Altman An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression , 1992 .

[18]  Peter Drot,et al.  COMPARATIVE STUDY OF MACHINE LEARNING TECHNIQUES FOR SUPERVISED CLASSIFICATION OF BIOMEDICAL DATA , 2014 .

[19]  Ebrahim Edriss Ebrahim Ali,et al.  Breast Cancer Classification using Support Vector Machine and Neural Network , 2016 .

[20]  S. Shanthi,et al.  Application of Data Mining Techniques to Model Breast Cancer Data , 2013 .