An Efficient Prediction of Breast Cancer Data using Data Mining Techniques

Breast cancer is one of the major causes of death in women when compared to all other cancers. Breast cancer has become the most hazardous types of cancer among women in the world. Early detection of breast cancer is essential in reducing life losses. This paper presents a comparison among the different Data mining classifiers on the database of breast cancer Wisconsin Breast Cancer (WBC), by using classification accuracy. This paper aims to establish an accurate classification model for Breast cancer prediction, in order to make full use of the invaluable information in clinical data, especially which is usually ignored by most of the existing methods when they aim for high prediction accuracies. We have done experiments on WBC data. The dataset is divided into training set with 499 and test set with 200 patients. In this experiment, we compare six classification techniques in Weka software and comparison results show that Support Vector Machine (SVM) has higher prediction accuracy than those methods. Different methods for breast cancer detection are explored and their accuracies are compared. With these results, we infer that the SVM are more suitable in handling the classification problem of breast cancer prediction, and we recommend the use of these approaches in similar classification problems. Keywords—breast cancer; classification; Decision tree, Naïve Bayes, MLP, Logistic Regression SVM, KNN and weka;

[1]  Hiroshi Tanaka,et al.  Comparison of Seven Algorithms to Predict Breast Cancer Survival( Contribution to 21 Century Intelligent Technologies and Bioinformatics) , 2008 .

[2]  Nosrat Shahsavar,et al.  Predicting Metastasis in Breast Cancer: Comparing a Decision Tree with Domain Experts , 2007, Journal of Medical Systems.

[3]  Hermann Brenner,et al.  Long-term survival rates of cancer patients achieved by the end of the 20th century: a period analysis , 2002, The Lancet.

[4]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[5]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  A. A. Safavi,et al.  Predicting breast cancer survivability using data mining techniques , 2010, 2010 2nd International Conference on Software Technology and Engineering.

[9]  Ian H. Witten,et al.  Data mining - practical machine learning tools and techniques, Second Edition , 2005, The Morgan Kaufmann series in data management systems.

[10]  Panayiotis E. Pintelas,et al.  Combining Bagging and Boosting , 2007 .

[11]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.