Feature Selection Mammogram based on Breast Cancer Mining

The very dense breast of mammogram image makes the Radiologists often have difficulties in interpreting the mammography objectively and accurately. One of the key success factors of computer-aided diagnosis (CADx) system is the use of the right features. Therefore, this research emphasizes on the feature selection process by performing the data mining on the results of mammogram image feature extraction. There are two algorithms used to perform the mining, the decision tree and the rule induction. Furthermore, the selected features produced by the algorithms are tested using classification algorithms: k-nearest neighbors, decision tree, and naive bayesian with the scheme of 10-fold cross validation using stratified sampling way. There are five descriptors that are the best features and have contributed in determining the classification of benign and malignant lesions as follows: slice, integrated density, area fraction, model gray value, and center of mass. The best classification results based on the five features are generated by the decision tree algorithm with accuracy, sensitivity, specificity, FPR, and TPR of 93.18%; 87.5%; 3.89%; 6.33% and 92.11% respectively.

[1]  A. Vadivel,et al.  A New Feature Reduction Method for Mammogram Mass Classification , 2011 .

[2]  Fitri Bimantoro,et al.  Image Retrieval Based on Multi Structure Co-occurrence Descriptor , 2016 .

[3]  O. Linton,et al.  American College of Radiology , 2018, Definitions.

[4]  Hepzibah A. Christinal,et al.  Neighbourhood search feature selection method for content-based mammogram retrieval , 2017, Medical & Biological Engineering & Computing.

[5]  Roman Słowiński,et al.  Sequential covering rule induction algorithm for variable consistency rough set approaches , 2011, Inf. Sci..

[6]  Wei Zheng,et al.  Feature Selection Method Based on Improved Document Frequency , 2014 .

[7]  Lina Choridah,et al.  Improvement of Sample Selection: A Cascade-Based Approach for Lesion Automatic Detection , 2016 .

[8]  Alka Gangrade,et al.  Performance Analysis of SMC Protocols for Decision Tree Classification Rule Mining , 2012 .

[9]  Robert Sabourin,et al.  Feature Subset Selection Using an Optimized Hill Climbing Algorithm for Handwritten Character Recognition , 2004, SSPR/SPR.

[10]  K. K. Thyagharajan,et al.  Features based Mammogram Image Classification using Weighted Feature Support Vector Machine , 2020, ArXiv.

[11]  D. Abraham Chandy,et al.  Texture feature extraction using gray level statistical matrix for content-based mammogram retrieval , 2013, Multimedia Tools and Applications.

[12]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[13]  Rozi Mahmud,et al.  Breast Density Classification Using Histogram-Based Features , 2012 .

[14]  Jie Yang,et al.  An Improved Branch & Bound Algorithm in Feature Selection , 2003, RSFDGrC.