Attention-Guided Supervised Contrastive Learning for Semantic Segmentation

Contrastive learning has shown superior performance in embedding global and spatial invariant features in computer vision (e.g., image classification). However, its overall success of embedding local and spatial variant features is still limited, especially for semantic segmentation. In a per-pixel prediction task, more than one label can exist in a single image for segmentation (e.g., an image contains both cat, dog, and grass), thereby it is difficult to define “positive” or “negative” pairs in a canonical contrastive learning setting. In this paper, we propose an attention-guided supervised contrastive learning approach to highlight a single semantic object every time as the target. With our design, the same image can be embedded to different semantic clusters with semantic attention (i.e., coerce semantic masks) as an additional input channel. To achieve such attention, a novel two-stage training strategy is presented. We evaluate the proposed method on multiorgan medical image segmentation task, as our major task, with both in-house data and BTCV 2015 datasets. Comparing with the supervised and semi-supervised training state-of-the-art in the backbone of ResNet-50, our proposed pipeline yields substantial improvement of 5.53% and 6.09% in Dice score for both medical image segmentation cohorts respectively. The performance of the proposed method on natural images is assessed via PASCAL VOC 2012 dataset, and achieves 2.75% substantial improvement.

[1]  Pascal Fua,et al.  Domain Adaptation for Semantic Segmentation via Patch-Wise Contrastive Learning , 2021, ArXiv.

[2]  Shunxing Bao,et al.  Body Part Regression With Self-Supervision , 2021, IEEE Transactions on Medical Imaging.

[3]  Seunghoon Hong,et al.  Neural Contrast Enhancement of CT Image , 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[4]  Ho Hin Lee,et al.  Rap-Net: Coarse-To-Fine Multi-Organ Segmentation With Single Random Anatomical Prior , 2020, 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI).

[5]  Ying Wu,et al.  Contrastive Learning for Label Efficient Semantic Segmentation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Yu Zhang,et al.  Momentum contrastive learning for few-shot COVID-19 diagnosis from chest CT images , 2020, Pattern Recognition.

[7]  Yan Wang,et al.  View adaptive learning for pancreas segmentation , 2021, Biomed. Signal Process. Control..

[8]  Adam P. Harrison,et al.  Fully-Automated Liver Tumor Localization and Characterization from Multi-Phase MR Volumes Using Key-Slice ROI Parsing: A Physician-Inspired Approach , 2020, ArXiv.

[9]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[10]  Ching-Yao Chuang,et al.  Debiased Contrastive Learning , 2020, NeurIPS.

[11]  Ertunc Erdil,et al.  Contrastive learning of global and local features for medical image segmentation with limited annotations , 2020, NeurIPS.

[12]  Adam P. Harrison,et al.  Co-Heterogeneous and Adaptive Segmentation from Multi-Source and Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion Segmentation , 2020, ECCV.

[13]  Stefan Roth,et al.  Single-Stage Semantic Segmentation From Image Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Ce Liu,et al.  Supervised Contrastive Learning , 2020, NeurIPS.

[15]  In So Kweon,et al.  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Vicente Grau,et al.  A Deep learning Approach to Generate Contrast-Enhanced Computerised Tomography Angiography without the Use of Intravenous Contrast Agents , 2020, ArXiv.

[17]  Hyeran Byun,et al.  Learning Texture Invariant Representation for Domain Adaptation of Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[19]  Daguang Xu,et al.  Generalizing Deep Learning for Medical Image Segmentation to Unseen Domains via Deep Stacked Transformation , 2020, IEEE Transactions on Medical Imaging.

[20]  Hao Chen,et al.  Unsupervised Bidirectional Cross-Modality Adaptation via Deeply Synergistic Image and Feature Alignment for Medical Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[21]  Pheng Ann Heng,et al.  Unpaired Multi-Modal Segmentation via Knowledge Distillation , 2020, IEEE Transactions on Medical Imaging.

[22]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Xilin Chen,et al.  Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.

[25]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[26]  Ali Razavi,et al.  Data-Efficient Image Recognition with Contrastive Predictive Coding , 2019, ICML.

[27]  Ming Dong,et al.  Cardiac Substructure Segmentation with Deep Learning for Improved Cardiac Sparing. , 2019, Medical physics.

[28]  Yang Wang,et al.  Region Mutual Information Loss for Semantic Segmentation , 2019, NeurIPS.

[29]  Yuanyuan Wang,et al.  The Domain Shift Problem of Medical Image Segmentation and Vendor-Adaptation by Unet-GAN , 2019, MICCAI.

[30]  R Devon Hjelm,et al.  Learning Representations by Maximizing Mutual Information Across Views , 2019, NeurIPS.

[31]  Han Zhang,et al.  Co-Occurrent Features in Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Yan Huang,et al.  Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xinlei Chen,et al.  Prior-Aware Neural Network for Partially-Supervised Multi-Organ Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Sungroh Yoon,et al.  FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Nima Tajbakhsh,et al.  Surrogate Supervision for Medical Image Analysis: Effective Deep Learning From Limited Quantities of Labeled Data , 2019, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).

[36]  Marleen de Bruijne,et al.  Automated 3D segmentation and diameter measurement of the thoracic aorta on non-contrast enhanced CT , 2019, European Radiology.

[37]  Paul Babyn,et al.  Generative Adversarial Network in Medical Imaging: A Review , 2018, Medical Image Anal..

[38]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[39]  Alan L. Yuille,et al.  Multi-Scale Coarse-to-Fine Segmentation for Screening Pancreatic Ductal Adenocarcinoma , 2018, MICCAI.

[40]  Hao Chen,et al.  PnP-AdaNet: Plug-and-Play Adversarial Domain Adaptation Network with a Benchmark at Cross-modality Cardiac Segmentation , 2018, ArXiv.

[41]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[42]  Yuichiro Hayashi,et al.  A multi-scale pyramid of 3D fully convolutional networks for abdominal multi-organ segmentation , 2018, MICCAI.

[43]  Wenyu Liu,et al.  Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Stella X. Yu,et al.  Adaptive Affinity Fields for Semantic Segmentation , 2018, ECCV.

[46]  Piotr J. Slomka,et al.  Deep Learning for Quantification of Epicardial and Thoracic Adipose Tissue From Non-Contrast CT , 2018, IEEE Transactions on Medical Imaging.

[47]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[48]  Klaus H. Maier-Hein,et al.  Exploiting the potential of unlabeled endoscopic video data with self-supervised learning , 2017, International Journal of Computer Assisted Radiology and Surgery.

[49]  Sergey Levine,et al.  Time-Contrastive Networks: Self-Supervised Learning from Video , 2017, 2018 IEEE International Conference on Robotics and Automation (ICRA).

[50]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Ben Glocker,et al.  DLTK: State of the Art Reference Implementations for Deep Learning on Medical Images , 2017, ArXiv.

[52]  Andrew Zisserman,et al.  Self-supervised Learning for Spinal MRIs , 2017, DLMIA/ML-CDS@MICCAI.

[53]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[54]  Wiro J Niessen,et al.  Automatic segmentation and quantification of the cardiac structures from non-contrast-enhanced cardiac CT scans , 2017, Physics in medicine and biology.

[55]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Alexei A. Efros,et al.  Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Rama Chellappa,et al.  Gaussian Conditional Random Field Network for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Thomas Brox,et al.  3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation , 2016, MICCAI.

[60]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[62]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[64]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[65]  Heinz Handels,et al.  Multi-modal Multi-Atlas Segmentation using Discrete Optimisation and Self-Similarities , 2015, VISCERAL Challenge@ISBI.

[66]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[67]  Noriyuki Moriyama,et al.  Improvement of image quality of low radiation dose abdominal CT by increasing contrast enhancement. , 2010, AJR. American journal of roentgenology.

[68]  Franz Pfeiffer,et al.  Toward Clinical X-ray Phase-Contrast CT: Demonstration of Enhanced Soft-Tissue Contrast in Human Specimen , 2010, Investigative radiology.

[69]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.