A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications

In the developing world, cancer death is one of the major problems for humankind. Even though there are many ways to prevent it before happening, some cancer types still do not have any treatment. One of the most common cancer types is breast cancer, and early diagnosis is the most important thing in its treatment. Accurate diagnosis is one of the most important processes in breast cancer treatment. In the literature, there are many studies about predicting the type of breast tumors. In this research paper, data about breast cancer tumors from Dr. William H. Walberg of the University of Wisconsin Hospital were used for making predictions on breast tumor types. Data visualization and machine learning techniques including logistic regression, k-nearest neighbors, support vector machine, naïve Bayes, decision tree, random forest, and rotation forest were applied to this dataset. R, Minitab, and Python were chosen to be applied to these machine learning techniques and visualization. The paper aimed to make a comparative analysis using data visualization and machine learning applications for breast cancer detection and diagnosis. Diagnostic performances of applications were comparable for detecting breast cancers. Data visualization and machine learning techniques can provide significant benefits and impact cancer detection in the decision-making process. In this paper, different machine learning and data mining techniques for the detection of breast cancer were proposed. Results obtained with the logistic regression model with all features included showed the highest classification accuracy (98.1%), and the proposed approach revealed the enhancement in accuracy performances. These results indicated the potential to open new opportunities in the detection of breast cancer.

[1]  S. Pal,et al.  Data Mining Techniques: To Predict and Resolve Breast Cancer Survivability , 2017 .

[2]  A. Gasparrini,et al.  Interrupted time series regression for the evaluation of public health interventions: a tutorial , 2016, International journal of epidemiology.

[3]  C. Humby,et al.  Process Mining: Data science in Action , 2014 .

[4]  Juan José Rodríguez Diez,et al.  Rotation Forest: A New Classifier Ensemble Method , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Wenbin Chen,et al.  Supervised Learning for Classification , 2005, FSKD.

[6]  Shu-Hsien Liao,et al.  Mining customer knowledge for tourism new product development and customer relationship management , 2010, Expert Syst. Appl..

[7]  M. D. Shehu,et al.  Using Five Machine Learning for Breast Cancer Biopsy Predictions Based on Mammographic Diagnosis , 2017 .

[8]  Abien Fred Agarap On breast cancer detection: an application of machine learning algorithms on the wisconsin diagnostic dataset , 2017, ICMLSC '18.

[9]  Xiaofeng Dai,et al.  Breast Cancer Cell Line Classification and Its Relevance with Breast Tumor Subtyping , 2017, Journal of Cancer.

[10]  Alireza Rowhanimanesh,et al.  Iranian Journal of Basic Medical Sciences , 2022 .

[11]  A. Godwin,et al.  Germline BRCA mutation evaluation in a prospective triple-negative breast cancer registry: implications for hereditary breast and/or ovarian cancer syndrome testing , 2014, Breast Cancer Research and Treatment.

[12]  Dana Bazazeh,et al.  Comparative study of machine learning algorithms for breast cancer detection and diagnosis , 2016, 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA).

[13]  Sonal Jain,et al.  Analysis of k-means clustering approach on the breast cancer Wisconsin dataset , 2016, International Journal of Computer Assisted Radiology and Surgery.

[14]  Vasant Dhar,et al.  Data science and prediction , 2012, CACM.

[15]  Markus Hagenbuchner,et al.  Breast cancer data analysis for survivability studies and prediction , 2018, Comput. Methods Programs Biomed..

[16]  Sotiris B. Kotsiantis,et al.  Decision trees: a recent overview , 2011, Artificial Intelligence Review.

[17]  Sebastián Ventura,et al.  Educational data science in massive open online courses , 2016, WIREs Data Mining Knowl. Discov..

[18]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[19]  Ying Ju,et al.  Predicting Diabetes Mellitus With Machine Learning Techniques , 2018, Front. Genet..

[20]  S. Kumita,et al.  Comparative analysis between synthetic mammography reconstructed from digital breast tomosynthesis and full-field digital mammography for breast cancer detection and visibility , 2020, European journal of radiology open.

[21]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[22]  Yacine Rezgui,et al.  Optimizing Energy Efficiency in Operating Built Environment Assets through Building Information Modeling: A Case Study , 2017 .

[23]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[24]  Wahidah Husain,et al.  Data Mining in Healthcare – A Review , 2015 .

[25]  R Nithya,et al.  Classification of Normal and Abnormal Patterns in Digital Mammograms for Diagnosis of Breast Cancer , 2011 .

[26]  X. Shu,et al.  The After Breast Cancer Pooling Project: rationale, methodology, and breast cancer survivor characteristics , 2011, Cancer Causes & Control.

[27]  Predicting malignant tumor cells in breasts , 2018 .

[28]  Dimitrios I. Fotiadis,et al.  Machine learning applications in cancer prognosis and prediction , 2014, Computational and structural biotechnology journal.

[29]  A. Salcedo-Bernal,et al.  Clinical Data Analysis: An Opportunity to Compare Machine Learning Methods , 2016 .

[30]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[31]  F. Bray,et al.  A Global Cancer Surveillance Framework Within Noncommunicable Disease Surveillance: Making the Case for Population-Based Cancer Registries , 2017, Epidemiologic reviews.

[32]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[33]  Suhaeri,et al.  Development of Intelligent Breast Cancer Prediction using Extreme Learning Machine in Java , 2016 .

[34]  A. Govardhan,et al.  Analysis of coronary heart disease and prediction of heart attack in coal mining regions using data mining techniques , 2010, 2010 5th International Conference on Computer Science & Education.

[35]  L. V. Nandakishore,et al.  KNOWLEDGE BASED ANALYSIS OF VARIOUS STATISTICAL TOOLS IN DETECTING BREAST CANCER , 2011 .

[36]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[37]  Ren Jiadong,et al.  A Comprehensive Looks at Data Mining Techniques Contributing to Medical Data Growth: A Survey of Researcher Reviews , 2018, Advances in Intelligent Systems and Computing.

[38]  Abdulhamit Subasi,et al.  Breast cancer diagnosis using GA feature selection and Rotation Forest , 2015, Neural Computing and Applications.

[39]  Mohammadreza Sehhati,et al.  An Optimized Framework for Cancer Prediction Using Immunosignature , 2018, Journal of medical signals and sensors.

[40]  Viju Raghupathi,et al.  Big data analytics in healthcare: promise and potential , 2014, Health Information Science and Systems.

[41]  Xiaomei Ma,et al.  Global Burden of Cancer , 2006, The Yale journal of biology and medicine.

[42]  William Nick Street,et al.  Breast Cancer Diagnosis and Prognosis Via Linear Programming , 1995, Oper. Res..

[43]  J. Cubiella,et al.  Risk prediction models for colorectal cancer in people with symptoms: a systematic review , 2016, BMC Gastroenterology.

[44]  Malvoni Maria,et al.  Machine Learning Based Approaches for Modeling the Output Power of Photovoltaic Array in Real Outdoor Conditions , 2020 .