Classification of Patients with Breast Cancer using Neighbourhood Component Analysis and Supervised Machine Learning Techniques

Breast cancer is considered one of the leading causes of death among women. In morocco, the ministry of health reports over 40.000 new cases each year. When lifestyle can be a preventive pattern, early detection remains a factor of a huge impact on the mortality of the diseases. Machine learning (ML) algorithms offer an alternative to breast cancer standard techniques of prediction, or at least can assist radiologists in their reasoning flow and thus saving many females and some males from breast cancer biopsy. The present study represents a benchmarking of different ML models. The research applies and compares four machine learning algorithms (kNN, decision tree, Binary SVM, and Adaboost) to predict whether a patient has a malignant or a benign tumor. The machine learning techniques have been trained then tested on the Breast Cancer Wisconsin dataset. The datasets features are fed into feature selection model with Neighbourhood Components Analysis (NCA) to reduce the number of features and therefore decrease the complexity of the model. The predictive accuracy reached a 99.12% for the kNN model, the best predictive specificity obtained was 9S.S6% for the Binary SVM model and the highest predictive sensitivity obtained was up to one for both kNN and Adaboost models.

[1]  Z. Zaidi,et al.  Abstract 4191: The worldwide female breast cancer incidence and survival, 2018 , 2019, Epidemiology.

[2]  E. E. Houby A survey on applying machine learning techniques for management of diseases , 2018 .

[3]  Bouchaib Cherradi,et al.  Predicting diabetes diseases using mixed data and supervised machine learning algorithms , 2019, SCA.

[4]  Alaa M. El-Halees,et al.  Breast Cancer Severity Degree Predication Using Data Mining Techniques in the Gaza Strip , 2018, 2018 International Conference on Promising Electronic Technologies (ICPET).

[5]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[6]  Geoffrey I. Webb,et al.  Encyclopedia of Machine Learning and Data Mining , 2017, Encyclopedia of Machine Learning and Data Mining.

[7]  Mojgan Mokhtari,et al.  Breast cancer diagnosis: Imaging techniques and biochemical markers , 2018, Journal of cellular physiology.

[8]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[9]  Puja Gupta,et al.  Breast Cancer Prediction using varying Parameters of Machine Learning Models , 2020 .

[10]  Bouchaib Cherradi,et al.  Diabetes Diseases Prediction Using Supervised Machine Learning and Neighbourhood Components Analysis , 2020, NISS.

[11]  Noel C. F. Codella,et al.  Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) , 2016, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[12]  Carlo Sansone,et al.  Pattern Recognition Approaches for Breast Cancer DCE-MRI Classification: A Systematic Review , 2016, Journal of Medical and Biological Engineering.

[13]  Bouchaib Cherradi,et al.  Machine Learning based System for Prediction of Breast Cancer Severity , 2019, 2019 International Conference on Wireless Networks and Mobile Communications (WINCOM).

[14]  Kunio Doi,et al.  Computer-aided diagnosis in medical imaging: Historical review, current status and future potential , 2007, Comput. Medical Imaging Graph..

[15]  Song Gao,et al.  A Hybrid Method for Traffic Incident Duration Prediction Using BOA-Optimized Random Forest Combined with Neighborhood Components Analysis , 2019, Journal of Advanced Transportation.

[16]  J. Lortet-Tieulent,et al.  Breast Cancer Screening for Women at Average Risk: 2015 Guideline Update From the American Cancer Society. , 2015, JAMA.

[17]  Mikko Kolehmainen,et al.  Structure-based classification of active and inactive estrogenic compounds by decision tree, LVQ and kNN methods. , 2006, Chemosphere.

[18]  S. Fields,et al.  Improved mammographic interpretation of masses using computer-aided diagnosis , 2000, European Radiology.

[19]  Brian K. Smith,et al.  An optimum ANN-based breast cancer diagnosis: Bridging gaps between ANN learning and decision-making goals , 2018, Appl. Soft Comput..

[20]  Oumaima Terrada,et al.  Atherosclerosis disease prediction using Supervised Machine Learning Techniques , 2020, 2020 1st International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET).

[21]  Soufiane HAMIDA,et al.  Performance Evaluation of Machine Learning Algorithms in Handwritten Digits Recognition , 2019, 2019 1st International Conference on Smart Systems and Data Science (ICSSD).

[22]  Hercules Dalianis,et al.  Evaluation Metrics and Evaluation , 2018 .

[23]  Peter A. Flach,et al.  A Response to Webb and Ting’s On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions , 2005, Machine Learning.

[24]  S. Pal,et al.  Prediction of benign and malignant breast cancer using data mining techniques , 2018 .

[25]  T. Freer,et al.  Screening mammography with computer-aided detection: prospective study of 12,860 patients in a community breast center. , 2001, Radiology.