G2LL: Global-To-Local Self-Supervised Learning for Label-Efficient Transformer-Based Skin Lesion Segmentation in Dermoscopy Images

Skin lesion segmentation in dermoscopy images is highly relevant for lesion assessment and subsequent analysis. Recently, automatic transformer-based skin lesion segmentation models have achieved high segmentation accuracy owing to their long-range modeling capability. However, limited labeled data for training the lesion segmentation models results in sub-optimal learning results. In this paper, we propose a Global-to-Local self-supervised Learning (G2LL) method for transformer-based skin lesion segmentation models to alleviate the problem of insufficient annotated data. Firstly, a structure-wise masking strategy for Masked Image Modeling (MIM) is proposed to force the model to learn the reconstruction of masked structures by exploring the semantic local contexts. Instead of masking patches randomly in the whole view, it computes super-pixels to divide the images into several structured regions. Then, it masks the fixed number of patches in each region, thus it allows the exploration of the structural knowledge and solves the shape variance in the meanwhile. Secondly, a self-distilling architecture is deployed to enhance global context learning where the masked images are sent to a student network and the relative unmasked images are fed to a teacher network for knowledge distillation. In this context, extensive experiments on both the ISIC-2017 and the ISIC-2019 datasets containing a total of 28081 images show that the proposed approach is superior to state-of-the-art self-supervised learning methods.

[1]  Jiacheng Wang,et al.  XBound-Former: Toward Cross-Scale Boundary Modeling in Transformers , 2022, IEEE Transactions on Medical Imaging.

[2]  Jian Zheng,et al.  ICL-Net: Global and Local Inter-Pixel Correlations Learning Network for Skin Lesion Segmentation , 2022, IEEE Journal of Biomedical and Health Informatics.

[3]  Han Hu,et al.  SimMIM: a Simple Framework for Masked Image Modeling , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Ross B. Girshick,et al.  Masked Autoencoders Are Scalable Vision Learners , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qichao Zhou,et al.  Boundary-Aware Transformers for Skin Lesion Segmentation , 2021, MICCAI.

[6]  Li Dong,et al.  BEiT: BERT Pre-Training of Image Transformers , 2021, ICLR.

[7]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Saining Xie,et al.  An Empirical Study of Training Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[10]  Jianming Liang,et al.  Models Genesis. , 2020, Medical image analysis.

[11]  Tom Vercauteren,et al.  CA-Net: Comprehensive Attention Convolutional Neural Networks for Explainable Medical Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[12]  Mahmood Fathy,et al.  Attention Deeplabv3+: Multi-level Context Attention Mechanism for Skin Lesion Segmentation , 2020, ECCV Workshops.

[13]  Hang Li,et al.  Dense Deconvolutional Network for Skin Lesion Segmentation , 2019, IEEE Journal of Biomedical and Health Informatics.

[14]  Petia Radeva,et al.  SLSDeep: Skin Lesion Segmentation Based on Dilated Residual and Pyramid Pooling Networks , 2018, MICCAI.

[15]  P. Tschandl,et al.  The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions , 2018, Scientific Data.

[16]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[17]  Noel C. F. Codella,et al.  Skin lesion analysis toward melanoma detection: A challenge at the 2017 International symposium on biomedical imaging (ISBI), hosted by the international skin imaging collaboration (ISIC) , 2016, 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018).

[18]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[19]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[20]  Juan Lu,et al.  Automatic Segmentation of Scaling in 2-D Psoriasis Skin Images , 2013, IEEE Transactions on Medical Imaging.

[21]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Proposed Update Unicode ® Standard Annex # 29 UNICODE TEXT SEGMENTATION , 2020 .