论文信息 - Mixed Transformer U-Net for Medical Image Segmentation

Mixed Transformer U-Net for Medical Image Segmentation

Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous interand intraaffinities learning. MTM first calculates selfaffinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

[1] Yan Wang,et al. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[2] Seyed-Ahmad Ahmadi,et al. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[3] Guangtao Zhai,et al. Transclaw U-Net: Claw U-Net With Transformers for Medical Image Segmentation , 2021, 2022 5th International Conference on Information Communication and Signal Processing (ICICSP).

[4] Ben Glocker,et al. Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images , 2018, Medical Image Anal..

[5] Yan Wang,et al. Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation , 2020, MICCAI.

[6] Nicolas Usunier,et al. End-to-End Object Detection with Transformers , 2020, ECCV.

[7] Shi-Min Hu,et al. Beyond Self-Attention: External Attention Using Two Linear Layers for Visual Tasks , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Tim Salimans,et al. Axial Attention in Multidimensional Transformers , 2019, ArXiv.

[9] Huiye Liu,et al. TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation , 2021, MICCAI.

[10] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[11] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[12] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13] Ting Liu,et al. Gaussian Transformer: A Lightweight Approach for Natural Language Inference , 2019, AAAI.

[14] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[15] Xipeng Qiu,et al. TENER: Adapting Transformer Encoder for Named Entity Recognition , 2019, ArXiv.

[16] Wenxuan Wang,et al. TransBTS: Multimodal Brain Tumor Segmentation Using Transformer , 2021, MICCAI.

[17] Yann Dauphin,et al. Pay Less Attention with Lightweight and Dynamic Convolutions , 2019, ICLR.