A Comparative Analysis of Breast Cancer Diagnosis by Fusing Visual and Semantic Feature Descriptors

Computer-aided Diagnosis (CAD) systems have become a significant assistance tool, that are used to help identify abnormal/normal regions of interest in mammograms faster and more effectively than human readers. In this work, we propose a new approach for breast cancer identification of all type of lesions in digital mammograms by combining low-and high-level mammogram descriptors in a compact form. The proposed method consists of two major stages: Initially, a feature extraction process that utilizes two dimensional discrete transforms based on ART, Shapelets and textural representations based on Gabor filter banks, is used to extract low-level visual descriptors. To further improve our method's performance, the semantic information of each mammogram given by radiologists is encoded in a 16-bit length word high-level feature vector. All features are stored in a quaternion and fused using the L2 norm prior to their presentation to the classification module. For the classification task, each ROS is recognized using two different classification models, Ada Boost and Random Forest. The proposed method is evaluated on regions taken from the DDSM database. The results show that Ada Boost outperforms Random Forest in terms of accuracy (99.2%$(\pm 0.527)$ against 93.78% $(\pm 1.659))$, precision, recall and F-measure. Both classifiers achieve a mean accuracy of 33% and 38% higher than using only visual descriptors, showing that semantic information can indeed improve the diagnosis when it is combined with standard visual features.