Applications of Machine Learning Techniques to Predict Diagnostic Breast Cancer

This article compares six machine learning (ML) algorithms: Classification and Regression Tree (CART), Support Vector Machine (SVM), Naive Bayes (NB), K-Nearest Neighbors (KNN), Linear Regression (LR) and Multilayer Perceptron (MLP) on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset by estimating their classification test accuracy, standardized data accuracy and runtime analysis. The main objective of this study is to improve the accuracy of prediction using a new statistical method of feature selection. The data set has 32 features, which are reduced using statistical techniques (mode), and the same measurements as above are applied for comparative studies. In the reduced attribute data subset (12 features), we applied 6 integrated models AdaBoost (AB), Gradient Boosting Classifier (GBC), Random Forest (RF), Extra Tree (ET) Bagging and Extra Gradient Boost (XGB), to minimize the probability of misclassification based on any single induced model. We also apply the stacking classifier (Voting Classifier) ​​to basic learners: Logistic Regression (LR), Decision Tree (DT), Support-vector clustering (SVC), K-Nearest Neighbors (KNN), Random Forest (RF) and Naive Bays (NB) to find out the accuracy obtained by voting classifier (Meta level). To implement the ML algorithm, the data set is divided in the following manner: 80% is used in the training phase and 20% is used in the test phase. To adjust the classifier, manually assigned hyper-parameters are used. At different stages of classification, all ML algorithms perform best, with test accuracy exceeding 90% especially when it is applied to a data subset.

[1]  Xiaoqi Zheng,et al.  PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations , 2014, PloS one.

[2]  Ravie Chandren Muniyandi,et al.  An Enhancement in Cancer Classification Accuracy Using a Two-Step Feature Selection Method Based on Artificial Neural Networks with 15 Neurons , 2020, Symmetry.

[3]  Irwin O. Kennedy,et al.  Feature extraction approaches to RF fingerprinting for device identification in femtocells , 2010, Bell Labs Technical Journal.

[4]  Shervin Malmasi,et al.  Native Language Identification With Classifier Stacking and Ensembles , 2018, CL.

[5]  Ildar Z. Batyrshin,et al.  Constructing Time Series Shape Association Measures: Minkowski Distance and Data Standardization , 2013, 2013 BRICS Congress on Computational Intelligence and 11th Brazilian Congress on Computational Intelligence.

[6]  H. Handels,et al.  Extra Tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences , 2015, Journal of Neuroscience Methods.

[7]  Abou Bekr,et al.  A NEURO-FUZZY INFERENCE MODEL FOR BREAST CANCER RECOGNITION , 2012 .

[8]  Sang Won Yoon,et al.  Breast cancer diagnosis based on feature extraction using a hybrid of K-means and support vector machine algorithms , 2014, Expert Syst. Appl..

[9]  J. Fernando Sánchez-Rada,et al.  Enhancing deep learning sentiment analysis with ensemble techniques in social applications , 2020 .

[10]  N. Emami,et al.  A New Knowledge-Based System for Diagnosis of Breast Cancer by a combination of the Affinity Propagation and Firefly Algorithms , 2019 .

[11]  M. Cho,et al.  Non-Gaussian statistics of amide I mode frequency fluctuation of N-methylacetamide in methanol solution: linear and nonlinear vibrational spectra. , 2004, The Journal of chemical physics.

[12]  Vinod Jagannath Kadam,et al.  Breast Cancer Diagnosis Using Feature Ensemble Learning Based on Stacked Sparse Autoencoders and Softmax Regression , 2019, Journal of Medical Systems.

[13]  S. Pal,et al.  Prediction of benign and malignant breast cancer using data mining techniques , 2018 .

[14]  Alaa M. Elsayad,et al.  Predicting the Severity of Breast Masses with Ensemble of Bayesian Classifiers , 2010 .

[15]  S. Pal,et al.  Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability , 2017 .

[16]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[17]  Jon Kleinberg,et al.  Algorithms Need Managers, Too , 2016 .

[18]  Enrico Coiera,et al.  Guide to medical informatics, the internet and telemedicine Enrico Coiera Guide to Medical Informatics, The Internet and Telemedicine Chapman & Hall 376pp £29.99 0-412-75710-9 0412757109 [Formula: see text]. , 1998, Nursing standard (Royal College of Nursing (Great Britain) : 1987).

[19]  Ying Wang,et al.  Identifying ultrasound and clinical features of breast cancer molecular subtypes by ensemble decision , 2015, Scientific Reports.

[20]  K. Usha Rani,et al.  ANALYSIS OF FEATURE SELECTION WITH CLASSFICATION: BREAST CANCER DATASETS , 2011 .

[21]  Semih Ergin,et al.  The Impact of Feature Extraction and Selection on SMS Spam Filtering , 2013 .

[22]  E. Kannan,et al.  An efficient framework for heart disease classification using feature extraction and feature selection technique in data mining , 2016, 2016 International Conference on Emerging Trends in Engineering, Technology and Science (ICETETS).

[23]  Mei-Ling Huang,et al.  Neural Network Classifier with Entropy Based Feature Selection on Breast Cancer Diagnosis , 2010, Journal of Medical Systems.

[24]  Mehrbakhsh Nilashi,et al.  A knowledge-based system for breast cancer classification using fuzzy logic method , 2017, Telematics Informatics.

[25]  Alok N. Choudhary,et al.  Colon cancer survival prediction using ensemble data mining on SEER data , 2013, 2013 IEEE International Conference on Big Data.

[26]  Amit Gupta,et al.  Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms , 2016 .

[27]  M. M. Saritas,et al.  Performance Analysis of ANN and Naive Bayes Classification Algorithm for Data Classification , 2019 .