Breast Cancer Diagnosis Using an Unsupervised Feature Extraction Algorithm Based on Deep Learning

Breast cancer is the most often detected cancer in women. At the same time, it is one of the most curable types of cancer if diagnosed early. With the development of the detection technology, a growing amount of clinical data and high-dimensional features can be used for breast cancer diagnosis. The high-dimensional data contributes to advances in the diagnostic technology, but also incurs a large amount of computational redundancy. Thus, extracting important information and reducing the feature dimension is critical to effective prediction and an accurate treatment decision. However, the previous works for breast cancer diagnosis are mainly based on labeled data that is difficult to obtain. To address this issue, in this paper, we demonstrate a new scheme, which integrates a deep learning based unsupervised feature extraction algorithm, the stacked auto-encoders, with a support vector machine model (SAE-SVM), for breast cancer diagnosis. The stacked auto-encoders with the greedy layer-wise pre-training and an improved momentum update algorithm is applied to capture essential information and extract necessary features of the original data. Then, a support vector machine model is employed to classify the samples with new features into malignant or benign tumors. The proposed method was tested on the Wisconsin Diagnostic Breast Cancer data set. The performance is evaluated using various measures and compared with the previously published results. The comparison results show that the proposed SAE-SVM method improves the accuracy to 98.25% and outperforms the other methods. The deep learning based unsupervised feature extraction significantly improves the performance of classification and provides a promising approach to breast cancer diagnosis.