Predicting the Severity of Breast Masses with Data Mining Methods

Mammography is the most effective and available tool for breast cancer screening. However, the low positive predictive value of breast biopsy resulting from mammogram interpretation leads to approximately 70% unnecessary biopsies with benign outcomes. Data mining algorithms could be used to help physicians in their decisions to perform a breast biopsy on a suspicious lesion seen in a mammogram image or to perform a short term follow-up examination instead. In this research paper data mining classification algorithms; Decision Tree (DT), Artificial Neural Network (ANN), and Support Vector Machine (SVM) are analyzed on mammographic masses data set. The purpose of this study is to increase the ability of physicians to determine the severity (benign or malignant) of a mammographic mass lesion from BI-RADS attributes and the patient,s age. The whole data set is divided for training the models and test them by the ratio of 70:30% respectively and the performances of classification algorithms are compared through three statistical measures; sensitivity, specificity, and classification accuracy. Accuracy of DT, ANN and SVM are 78.12%, 80.56% and 81.25% of test samples respectively. Our analysis shows that out of these three classification models SVM predicts severity of breast cancer with least error rate and highest accuracy.

[1]  Yuhai Wu,et al.  Statistical Learning Theory , 2021, Technometrics.

[2]  Aytürk Keles,et al.  Expert system based on neuro-fuzzy rules for diagnosis breast cancer , 2011, Expert Syst. Appl..

[3]  Veera Boonjing,et al.  Comparing performances of logistic regression, decision trees, and neural networks for classifying heart disease patients , 2010, 2010 International Conference on Computer Information Systems and Industrial Management Applications (CISIM).

[4]  Alaa M. Elsayad,et al.  Predicting the Severity of Breast Masses with Ensemble of Bayesian Classifiers , 2010 .

[5]  Mevlut Ture,et al.  Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients , 2009, Expert Syst. Appl..

[6]  M. Elter,et al.  The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. , 2007, Medical physics.

[7]  Yuehjen E. Shao,et al.  Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines , 2004, Expert Syst. Appl..

[8]  Khashayar Khorasani,et al.  New training strategies for constructive neural networks with application to regression problems , 2004, Neural Networks.

[9]  Robert M. Nishikawa,et al.  Po-topic III-06: The potential of computer-aided diagnosis (CAD) to reduce variability in radiologists’ interpretation of mammograms , 2003 .

[10]  Rüdiger W. Brause,et al.  Medical Analysis and Diagnosis by Neural Networks , 2001, ISMDA.

[11]  Tong Zhang An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[12]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[13]  Bidyut Baran Chaudhuri,et al.  Efficient training and improved performance of multilayer perceptron in pattern classification , 2000, Neurocomputing.

[14]  Parag C. Pendharkar,et al.  Association, statistical, mathematical and neural approaches for mining breast cancer patterns , 1999 .

[15]  Igor V. Tetko,et al.  Efficient Partition of Learning Data Sets for Neural Network Training , 1997, Neural Networks.

[16]  Evangelos Triantaphyllou,et al.  Fuzzy logic in computer-aided breast cancer diagnosis: analysis of lobulation , 1997, Artif. Intell. Medicine.

[17]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[18]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[19]  Kurt Hornik,et al.  Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks , 1990, Neural Networks.

[20]  John Elder,et al.  Handbook of Statistical Analysis and Data Mining Applications , 2009 .

[21]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[22]  Sameer Singh,et al.  Identification of Regions of Interest in Digital Mammograms , 2000 .

[23]  Emile Fiesler,et al.  Neural Network Pruning and Pruning Parameters , 1996 .