Modified Bat Algorithm for Feature Selection with the Wisconsin Diagnosis Breast Cancer (WDBC) Dataset

Early diagnosis of breast cancer is essential to save lives of patients. Usually, medical datasets include a large variety of data that can lead to confusion during diagnosis. The Knowledge Discovery on Database (KDD) process helps to improve efficiency. It requires elimination of inappropriate and repeated data from the dataset before final diagnosis. This can be done using any of the feature selection algorithms available in data mining. Feature selection is considered as a vital step to increase the classification accuracy. This paper proposes a Modified Bat Algorithm (MBA) for feature selection to eliminate irrelevant features from an original dataset. The Bat algorithm was modified using simple random sampling to select the random instances from the dataset. Ranking was with the global best features to recognize the predominant features available in the dataset. The selected features are used to train a Random Forest (RF) classification algorithm. The MBA feature selection algorithm enhanced the classification accuracy of RF in identifying the occurrence of breast cancer. The Wisconsin Diagnosis Breast Cancer Dataset (WDBC) was used for estimating the performance analysis of the proposed MBA feature selection algorithm. The proposed algorithm achieved better performance in terms of Kappa statistic, Mathew’s Correlation Coefficient, Precision, F-measure, Recall, Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Relative Absolute Error (RAE) and Root Relative Squared Error (RRSE).

[1]  Breast Lesions , 1953, Obstetrics and gynecology.

[2]  J. Reid The classification of lung cancer. , 1963, The Australian and New Zealand journal of surgery.

[3]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[4]  R. Bird,et al.  Analysis of cancers missed at screening mammography. , 1992, Radiology.

[5]  Nico Karssemeijer,et al.  Stochastic model for automated detection of calcifications in digital mammograms , 1992, Image Vis. Comput..

[6]  W. Philip Kegelmeyer EVALUATION OF STELLATE LESION DETECTION IN A STANDARD MAMMOGRAM DATA SET , 1993 .

[7]  J. Elmore,et al.  Variability in radiologists' interpretations of mammograms. , 1994, The New England journal of medicine.

[8]  L. Garfinkel,et al.  Changing trends: An overview of breast cancer incidence and mortality , 1994, Cancer.

[9]  J. M. Pruneda,et al.  Computer-aided mammographic screening for spiculated lesions. , 1994, Radiology.

[10]  Michele Nappi,et al.  Computer Aided Diagnosis in Radiology , 1995, ICSC.

[11]  Berkman Sahiner,et al.  An adaptive density-weighted contrast enhancement filter for mammographic breast mass detection , 1996, IEEE Trans. Medical Imaging.

[12]  Robin N. Strickland,et al.  Wavelet transforms for detecting microcalcifications in mammograms , 1996, IEEE Trans. Medical Imaging.

[13]  C. Beam,et al.  Variability in the interpretation of screening mammograms by US radiologists. Findings from a national sample. , 1996, Archives of internal medicine.

[14]  Masayuki Murakami,et al.  Computerized detection of malignant tumors on digital mammograms , 1999, IEEE Transactions on Medical Imaging.

[15]  N Karssemeijer,et al.  Automated classification of clustered microcalcifications into malignant and benign types. , 2000, Medical physics.

[16]  L. Tabár,et al.  Potential contribution of computer-aided detection to the sensitivity of screening mammography. , 2000, Radiology.

[17]  Rangaraj M. Rangayyan,et al.  Detection of breast masses in mammograms by density slicing and texture flow-field analysis , 2001, IEEE Transactions on Medical Imaging.

[18]  Sheng Liu,et al.  Multiresolution detection of spiculated lesions in digital mammograms , 2001, IEEE Trans. Image Process..

[19]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[20]  K. Han,et al.  Breast lesions on sonograms: computer-aided diagnosis with nearly setting-independent features and artificial neural networks. , 2003, Radiology.

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  Chein-I Chang,et al.  Classification of clustered microcalcifications using a Shape Cognitron neural network , 2003, Neural Networks.

[23]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[24]  Hamid Soltanian-Zadeh,et al.  Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms , 2004, Pattern Recognit..

[25]  Hee Chan Kim,et al.  Computer-aided diagnosis of solid breast nodules: use of an artificial neural network based on multiple sonographic features , 2004, IEEE Transactions on Medical Imaging.

[26]  Vivian West,et al.  Computing, Artificial Intelligence and Information Technology Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application , 2005 .

[27]  Berkman Sahiner,et al.  Computer-aided detection of breast masses on full field digital mammograms. , 2005, Medical physics.

[28]  R. Dasari,et al.  Diagnosing breast cancer by using Raman spectroscopy. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Santosh S. Venkatesh,et al.  Artificial neural network to aid differentiation of malignant and benign breast masses by ultrasound imaging , 2005, SPIE Medical Imaging.

[30]  Dar-Ren Chen,et al.  Diagnosis of breast tumors with ultrasonic texture analysis using support vector machines , 2006, Neural Computing & Applications.

[31]  E.J. Delp,et al.  A Comparison of Feature Selection Methods for the Detection of Breast Cancers in Mammograms: Adaptive Sequential Floating Search vs. Genetic Algorithm , 2005, 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference.

[32]  Isabelle Guyon,et al.  An Introduction to Feature Extraction , 2006, Feature Extraction.

[33]  Ta-Cheng Chen,et al.  A GAs based approach for mining breast cancer pattern , 2006, Expert Syst. Appl..

[34]  J. Baker,et al.  Breast mass lesions: computer-aided diagnosis models with mammographic and sonographic descriptors. , 2007, Radiology.

[35]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[36]  Yosvany López,et al.  Breast Cancer Diagnosis Based on a Suitable Combination of Deformable Models and Artificial Neural Networks Techniques , 2007, CIARP.

[37]  T. Warren Liao,et al.  Medical data mining by fuzzy modeling with selected features , 2008, Artif. Intell. Medicine.

[38]  Brijesh Verma,et al.  Hybrid ensemble approach for classification , 2011, Applied Intelligence.

[39]  A. Karegowda,et al.  COMPARATIVE STUDY OF ATTRIBUTE SELECTION USING GAIN RATIO AND CORRELATION BASED FEATURE SELECTION , 2010 .

[40]  Ling Zhang,et al.  Automated breast cancer detection and classification using ultrasound images: A survey , 2015, Pattern Recognit..

[41]  Joel Quintanilla-Domínguez,et al.  Breast cancer classification applying artificial metaplasticity algorithm , 2011, Neurocomputing.

[42]  Pradipta Kishore Dash,et al.  Local linear wavelet neural network for breast cancer recognition , 2011, Neural Computing and Applications.

[43]  A. Ramli,et al.  Computer-aided detection/diagnosis of breast cancer in mammography and ultrasound: a review. , 2013, Clinical imaging.

[44]  Jing Zhao,et al.  ACOSampling: An ant colony optimization-based undersampling method for classifying imbalanced DNA microarray data , 2013, Neurocomputing.

[45]  Nor Ashidi Mat Isa,et al.  A GA-based feature selection and parameter optimization of an ANN in diagnosing breast cancer , 2015, Pattern Analysis and Applications.

[46]  K. Ramesh Kumar,et al.  Analysis of Feature Selection Algorithms on Classification: A Survey , 2014 .

[47]  Miguel Ángel Guevara-López,et al.  Improving the Mann-Whitney statistical test for feature selection: An approach in breast cancer diagnosis on mammography , 2015, Artif. Intell. Medicine.

[48]  Parham Moradi,et al.  Relevance-redundancy feature selection based on ant colony optimization , 2015, Pattern Recognit..

[49]  Aydin Akan,et al.  Breast Cancer Detection with Reduced Feature Set , 2015, Comput. Math. Methods Medicine.

[50]  Aytug Onan,et al.  A fuzzy-rough nearest neighbor classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer , 2015, Expert Syst. Appl..

[51]  Tao Jiang,et al.  Evaluating Diagnostic Performance of Machine Learning Algorithms on Breast Cancer , 2015, IScIDE.

[52]  S. Appavu alias Balamurugan,et al.  A Novel Feature Selection Technique for Improved Survivability Diagnosis of Breast Cancer , 2015 .

[53]  Dong Xu,et al.  Classification of lung cancer using ensemble-based feature selection and machine learning methods. , 2015, Molecular bioSystems.

[54]  E. Venkatesan,et al.  Performance Analysis of Decision Tree Algorithms for Breast Cancer Classification , 2015 .

[55]  Usman Qamar,et al.  Heterogeneous classifiers fusion for dynamic breast cancer diagnosis using weighted vote based ensemble , 2015 .

[56]  Alireza Rowhanimanesh,et al.  Iranian Journal of Basic Medical Sciences , 2022 .

[57]  Nileshkumar Modi,et al.  A Comparative Analysis of Feature Selection Methods and Associated Machine Learning Algorithms on Wisconsin Breast Cancer Dataset (WBCD) , 2016 .