论文信息 - nnFormer: Interleaved Transformer for Volumetric Segmentation

nnFormer: Interleaved Transformer for Volumetric Segmentation

Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., not-another transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC. Codes and models are available at https://github.com/282857341/nnFormer.

[1] Chunhua Shen,et al. CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation , 2021, MICCAI.

[2] Loïc Le Folgoc,et al. Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[3] Kevin Gimpel,et al. Gaussian Error Linear Units (GELUs) , 2016 .

[4] Christopher Zach,et al. ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time , 2018, BMVC.

[5] Klaus H. Maier-Hein,et al. Automated Design of Deep Learning Methods for Biomedical Image Segmentation , 2019 .

[6] Xin Yang,et al. Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? , 2018, IEEE Transactions on Medical Imaging.

[7] Sepp Hochreiter,et al. Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[8] Seyed-Ahmad Ahmadi,et al. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[9] Ben Glocker,et al. Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images , 2018, Medical Image Anal..

[10] Yan Wang,et al. Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation , 2020, MICCAI.

[11] Wenxuan Wang,et al. TransBTS: Multimodal Brain Tumor Segmentation Using Transformer , 2021, MICCAI.

[12] Geoffrey E. Hinton,et al. Layer Normalization , 2016, ArXiv.

[13] Thomas L. Griffiths,et al. Are Convolutional Neural Networks or Transformers more like human vision? , 2021, ArXiv.

[14] Huiye Liu,et al. TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation , 2021, MICCAI.

[15] Gen Li,et al. DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation , 2019, BMVC.

[16] Qi Tian,et al. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[17] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[18] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[19] Yan Wang,et al. SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation , 2021, ArXiv.

[20] Yan Wang,et al. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[21] Hsueh-Ming Hang,et al. Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation , 2018, MMAsia.

[22] Mengyu Liu,et al. Feature Pyramid Encoding Network for Real-time Semantic Segmentation , 2019, BMVC.

[23] Dimitris N. Metaxas,et al. UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation , 2021, MICCAI.

[24] Roberto Cipolla,et al. Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[25] Stephen Lin,et al. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26] Zhenxue Chen,et al. Fast Semantic Segmentation for Scene Perception , 2019, IEEE Transactions on Industrial Informatics.

[27] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[28] Xiuchao Sui,et al. Medical Image Segmentation using Squeeze-and-Expansion Transformers , 2021, IJCAI.

[29] Guangming Lu,et al. DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation , 2021, IEEE Transactions on Instrumentation and Measurement.

[30] Sheng Tang,et al. CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[31] Guoping Xu,et al. LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation , 2021, PRCV.

[32] Alan Yuille,et al. DualNorm-UNet: Incorporating Global and Local Statistics for Robust Medical Image Segmentation , 2021, ArXiv.

[33] Xiping Hu,et al. More than Encoder: Introducing Transformer Decoder to Upsample , 2021, ArXiv.

[34] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[36] Guangming Lu,et al. TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation , 2021, ArXiv.

[37] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38] A. Gholipour,et al. Convolution-Free Medical Image Segmentation using Transformers , 2021, International Conference on Medical Image Computing and Computer-Assisted Intervention.

[39] Guangtao Zhai,et al. Transclaw U-Net: Claw U-Net With Transformers for Medical Image Segmentation , 2021, 2022 5th International Conference on Information Communication and Signal Processing (ICICSP).

[40] Vishal M. Patel,et al. Medical Transformer: Gated Axial-Attention for Medical Image Segmentation , 2021, MICCAI.

[41] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.