Medical Image Segmentation via Cascaded Attention Decoding

Transformers have shown great promise in medical image segmentation due to their ability to capture long-range dependencies through self-attention. However, they lack the ability to learn the local (contextual) relations among pixels. Previous works try to overcome this problem by embedding convolutional layers either in the encoder or decoder modules of transformers thus ending up sometimes with inconsistent features. To address this issue, we propose a novel attention-based decoder, namely CASCaded Attention DEcoder (CASCADE), which leverages the multi-scale features of hierarchical vision transformers. CASCADE consists of i) an attention gate which fuses features with skip connections and ii) a convolutional attention module that enhances the long-range and local context by suppressing background information. We use a multi-stage feature and loss aggregation framework due to their faster convergence and better performance. Our experiments demonstrate that transformers with CASCADE significantly outperform state-of-the-art CNN- and transformer-based approaches, obtaining up to 5.07% and 6.16% improvements in DICE and mIoU scores, respectively. CASCADE opens new ways of designing better attention-based decoders.

[1]  Dihan Li,et al.  TFCNs: A CNN-Transformer Hybrid Network for Medical Image Segmentation , 2022, ICANN.

[2]  Jionglong Su,et al.  Stepwise Feature Fusion: Local Guides Global , 2022, MICCAI.

[3]  P. Luo,et al.  PVT v2: Improved baselines with Pyramid Vision Transformer , 2021, Computational Visual Media.

[4]  Jianmin Bao,et al.  Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[6]  H. Fu,et al.  Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers , 2021, CAAI Artificial Intelligence Research.

[7]  Daijin Kim,et al.  UACANet: Uncertainty Augmented Context Attention for Polyp Segmentation , 2021, ACM Multimedia.

[8]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[9]  Xiang Li,et al.  Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Chunhua Shen,et al.  Conditional Positional Encodings for Vision Transformers , 2021, ICLR.

[11]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[12]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[13]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[14]  Shuyue Guan,et al.  DC-UNet: rethinking the U-Net architecture with dual channel efficient CNN for medical image segmentation , 2020, Medical Imaging.

[15]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Ling Shao,et al.  PraNet: Parallel Reverse Attention Network for Polyp Segmentation , 2020, MICCAI.

[17]  Lanfen Lin,et al.  UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Xinjian Chen,et al.  CPFNet: Context Pyramid Fusion Network for Medical Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[19]  Sen Jia,et al.  How Much Position Information Do Convolutional Neural Networks Encode? , 2020, ICLR.

[20]  Thomas de Lange,et al.  Kvasir-SEG: A Segmented Polyp Dataset , 2019, MMM.

[21]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Ling Shao,et al.  ET-Net: A Generic Edge-aTtention Guidance Network for Medical Image Segmentation , 2019, MICCAI.

[23]  Shenghua Gao,et al.  CE-Net: Context Encoder Network for 2D Medical Image Segmentation , 2019, IEEE Transactions on Medical Imaging.

[24]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[25]  Ben Wang,et al.  Reverse Attention for Salient Object Detection , 2018, ECCV.

[26]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[27]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[28]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[30]  Antonio M. López,et al.  A Benchmark for Endoluminal Scene Segmentation of Colonoscopy Images , 2016, Journal of healthcare engineering.

[31]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Nima Tajbakhsh,et al.  Automated Polyp Detection in Colonoscopy Videos Using Shape and Context Information , 2016, IEEE Transactions on Medical Imaging.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Fernando Vilariño,et al.  WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians , 2015, Comput. Medical Imaging Graph..

[35]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[36]  Aymeric Histace,et al.  Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer , 2014, International Journal of Computer Assisted Radiology and Surgery.