Classification of imbalanced oral cancer image data from high-risk population

Abstract. Significance: Early detection of oral cancer is vital for high-risk patients, and machine learning-based automatic classification is ideal for disease screening. However, current datasets collected from high-risk populations are unbalanced and often have detrimental effects on the performance of classification. Aim: To reduce the class bias caused by data imbalance. Approach: We collected 3851 polarized white light cheek mucosa images using our customized oral cancer screening device. We use weight balancing, data augmentation, undersampling, focal loss, and ensemble methods to improve the neural network performance of oral cancer image classification with the imbalanced multi-class datasets captured from high-risk populations during oral cancer screening in low-resource settings. Results: By applying both data-level and algorithm-level approaches to the deep learning training process, the performance of the minority classes, which were difficult to distinguish at the beginning, has been improved. The accuracy of “premalignancy” class is also increased, which is ideal for screening applications. Conclusions: Experimental results show that the class bias induced by imbalanced oral cancer image datasets could be reduced using both data- and algorithm-level methods. Our study may provide an important basis for helping understand the influence of unbalanced datasets on oral cancer deep learning classifiers and how to mitigate.

[1]  Wei Zhang,et al.  Diagnostic evaluation of a deep learning model for optical diagnosis of colorectal cancer , 2020, Nature Communications.

[2]  Ashish Khanna,et al.  Boosted neural network ensemble classification for lung cancer disease diagnosis , 2019, Appl. Soft Comput..

[3]  Eun Jong Cha,et al.  Classification of Kidney Cancer Data Using Cost-Sensitive Hybrid Deep Learning Approach , 2020, Symmetry.

[4]  Jiewei Jiang,et al.  Automatic diagnosis of imbalanced ophthalmic images using a cost-sensitive deep convolutional neural network , 2017, BioMedical Engineering OnLine.

[5]  Yuanjie Zheng,et al.  Breast Cancer Multi-classification from Histopathological Images with Structured Deep Learning Model , 2017, Scientific Reports.

[6]  Zhoujun Li,et al.  Applying adaptive over-sampling technique based on data density and cost-sensitive SVM to imbalanced learning , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[7]  Chee Seng Chan,et al.  Automated Detection and Classification of Oral Lesions Using Deep Learning for Early Detection of Oral Cancer , 2020, IEEE Access.

[8]  Jun Wu,et al.  A deep learning-based multi-model ensemble method for cancer prediction , 2018, Comput. Methods Programs Biomed..

[9]  Li Chen,et al.  Prediction of Breast Cancer from Imbalance Respect Using Cluster-Based Undersampling Method , 2019, Journal of healthcare engineering.

[10]  Keke Gai,et al.  An Empirical Study on Preprocessing High-Dimensional Class-Imbalanced Data for Classification , 2015, 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conference on Embedded Software and Systems.

[11]  Bofan Song,et al.  Small form factor, flexible, dual-modality handheld probe for smartphone-based, point-of-care oral and oropharyngeal cancer screening , 2019, Journal of biomedical optics.

[12]  Chaoyang Zhang,et al.  Deep Learning Based Analysis of Histopathological Images of Breast Cancer , 2019, Front. Genet..

[13]  Ying Wei,et al.  Computer-Aided Lung Nodule Recognition by SVM Classifier Based on Combination of Random Undersampling and SMOTE , 2015, Comput. Math. Methods Medicine.

[14]  K. Fatlawi,et al.  Enhanced Classification Model for Cervical Cancer Dataset based on Cost Sensitive Classifier Hayder , 2017 .

[15]  Jean-Christophe Burie,et al.  Improving Accuracy of Lung Nodule Classification Using Deep Learning with Focal Loss , 2019, Journal of healthcare engineering.

[16]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Kishan G. Mehrotra,et al.  An improved algorithm for neural network classification of imbalanced training sets , 1993, IEEE Trans. Neural Networks.

[18]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Umi Kalsom Yusof,et al.  Combining oversampling and undersampling techniques for imbalanced classification: A comparative study using credit card fraudulent transaction dataset , 2020, 2020 IEEE 16th International Conference on Control & Automation (ICCA).

[21]  Lijun Xie,et al.  A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data , 2018, Pattern Recognit..

[22]  Hanshen Chen,et al.  Automatic detection of oral cancer in smartphone-based images using deep learning for early diagnosis , 2021, Journal of biomedical optics.

[23]  Sharath Pankanti,et al.  Deep learning ensembles for melanoma recognition in dermoscopy images , 2016, IBM J. Res. Dev..

[24]  Hesham A. Hefny,et al.  An enhanced deep learning approach for brain cancer MRI images classification using residual networks , 2020, Artif. Intell. Medicine.

[25]  Bofan Song,et al.  Automatic classification of dual-modalilty, smartphone-based oral dysplasia and malignancy images using deep learning. , 2018, Biomedical optics express.