Analysis of Breast Cancer Dataset Using Big Data Algorithms for Accuracy of Diseases Prediction

Data Mining Techniques easily handle and solve the problem of handling the massive amount of data due to heterogeneous data, missing data, inconsistent data. HealthCare is one of the most important applications of Big Data. Diagnosis of diseases like cancer at an early stage is also very crucial. This paper focuses on the prediction model analysis for the breast cancer diagnosis either benign or malignant at an early stage as it increases the chances for successful treatment So predicting breast cancer at benign increases the survival rate of women. Data mining classification algorithm like SVM, Naive Bayes, k-NN, Decision Tree compares a variety of statistical techniques like accuracy, sensitivity, specification, positive prediction value, negative predictive value, area under curve and plotted ROC curve in R analytical tool which is promising independent tool for handling huge datasets is proven better in a prediction of the breast cancer diagnosis.

[1]  Dharminder Kumar,et al.  DATA MINING CLASSIFICATION TECHNIQUES APPLIED FOR BREAST CANCER DIAGNOSIS AND PROGNOSIS , 2011 .

[2]  Sapiah Binti Sakri,et al.  Particle Swarm Optimization Feature Selection for Breast Cancer Recurrence Prediction , 2018, IEEE Access.

[3]  Chintan Shah,et al.  Comparison of data mining classification algorithms for breast cancer prediction , 2013, 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT).

[4]  Thora Jonsdottir,et al.  The feasibility of constructing a Predictive Outcome Model for breast cancer using the tools of data mining , 2008, Expert Syst. Appl..

[5]  Aruna Tiwari,et al.  Breast cancer diagnosis using Genetically Optimized Neural Network model , 2015, Expert Syst. Appl..

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Hajar Mousannif,et al.  Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis , 2016, ANT/SEIT.

[8]  Mohammed J. Zaki Data Mining and Analysis: Fundamental Concepts and Algorithms , 2014 .

[9]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[10]  Jaber Alwidian,et al.  WCBA: Weighted classification based on association rules algorithm for breast cancer disease , 2018, Appl. Soft Comput..

[11]  Siddharth Swarup Rautaray,et al.  Parallel support vector machine used in map-reduce for risk analysis , 2017, 2017 Second International Conference on Electrical, Computer and Communication Technologies (ICECCT).