Mixed Transformer U-Net for Medical Image Segmentation

Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous interand intraaffinities learning. MTM first calculates selfaffinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

[1]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[2]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3]  Guangtao Zhai,et al.  Transclaw U-Net: Claw U-Net With Transformers for Medical Image Segmentation , 2021, 2022 5th International Conference on Information Communication and Signal Processing (ICICSP).

[4]  Ben Glocker,et al.  Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images , 2018, Medical Image Anal..

[5]  Yan Wang,et al.  Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation , 2020, MICCAI.

[6]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[7]  Shi-Min Hu,et al.  Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tim Salimans,et al.  Axial Attention in Multidimensional Transformers , 2019, ArXiv.

[9]  Huiye Liu,et al.  TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation , 2021, MICCAI.

[10]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Ting Liu,et al.  Gaussian Transformer: A Lightweight Approach for Natural Language Inference , 2019, AAAI.

[14]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15]  Xipeng Qiu,et al.  TENER: Adapting Transformer Encoder for Named Entity Recognition , 2019, ArXiv.

[16]  Wenxuan Wang,et al.  TransBTS: Multimodal Brain Tumor Segmentation Using Transformer , 2021, MICCAI.

[17]  Yann Dauphin,et al.  Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.