nnFormer: Interleaved Transformer for Volumetric Segmentation

Transformers, the default model of choices in natural language processing, have drawn scant attention from the medical imaging community. Given the ability to exploit long-term dependencies, transformers are promising to help atypical convolutional neural networks (convnets) to overcome its inherent shortcomings of spatial inductive bias. However, most of recently proposed transformer-based segmentation approaches simply treated transformers as assisted modules to help encode global context into convolutional representations without investigating how to optimally combine self-attention (i.e., the core of transformers) with convolution. To address this issue, in this paper, we introduce nnFormer (i.e., not-another transFormer), a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution. In practice, nnFormer learns volumetric representations from 3D local volumes. Compared to the naive voxel-level self-attention implementation, such volume-based operations help to reduce the computational complexity by approximate 98% and 99.5% on Synapse and ACDC datasets, respectively. In comparison to prior-art network configurations, nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC. For instance, nnFormer outperforms Swin-UNet by over 7 percents on Synapse. Even when compared to nnUNet, currently the best performing fully-convolutional medical segmentation network, nnFormer still provides slightly better performance on Synapse and ACDC. Codes and models are available at https://github.com/282857341/nnFormer.

[1]  Chunhua Shen,et al.  CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation , 2021, MICCAI.

[2]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[3]  Kevin Gimpel,et al.  Gaussian Error Linear Units (GELUs) , 2016 .

[4]  Christopher Zach,et al.  ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time , 2018, BMVC.

[5]  Klaus H. Maier-Hein,et al.  Automated Design of Deep Learning Methods for Biomedical Image Segmentation , 2019 .

[6]  Xin Yang,et al.  Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? , 2018, IEEE Transactions on Medical Imaging.

[7]  Sepp Hochreiter,et al.  Speeding up Semantic Segmentation for Autonomous Driving , 2016 .

[8]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[9]  Ben Glocker,et al.  Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images , 2018, Medical Image Anal..

[10]  Yan Wang,et al.  Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation , 2020, MICCAI.

[11]  Wenxuan Wang,et al.  TransBTS: Multimodal Brain Tumor Segmentation Using Transformer , 2021, MICCAI.

[12]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[13]  Thomas L. Griffiths,et al.  Are Convolutional Neural Networks or Transformers more like human vision? , 2021, ArXiv.

[14]  Huiye Liu,et al.  TransFuse: Fusing Transformers and CNNs for Medical Image Segmentation , 2021, MICCAI.

[15]  Gen Li,et al.  DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation , 2019, BMVC.

[16]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[17]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[18]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[19]  Yan Wang,et al.  SpecTr: Spectral Transformer for Hyperspectral Pathology Image Segmentation , 2021, ArXiv.

[20]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[21]  Hsueh-Ming Hang,et al.  Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation , 2018, MMAsia.

[22]  Mengyu Liu,et al.  Feature Pyramid Encoding Network for Real-time Semantic Segmentation , 2019, BMVC.

[23]  Dimitris N. Metaxas,et al.  UTNet: A Hybrid Transformer Architecture for Medical Image Segmentation , 2021, MICCAI.

[24]  Roberto Cipolla,et al.  Fast-SCNN: Fast Semantic Segmentation Network , 2019, BMVC.

[25]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Zhenxue Chen,et al.  Fast Semantic Segmentation for Scene Perception , 2019, IEEE Transactions on Industrial Informatics.

[27]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[28]  Xiuchao Sui,et al.  Medical Image Segmentation using Squeeze-and-Expansion Transformers , 2021, IJCAI.

[29]  Guangming Lu,et al.  DS-TransUNet: Dual Swin Transformer U-Net for Medical Image Segmentation , 2021, IEEE Transactions on Instrumentation and Measurement.

[30]  Sheng Tang,et al.  CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[31]  Guoping Xu,et al.  LeViT-UNet: Make Faster Encoders with Transformer for Medical Image Segmentation , 2021, PRCV.

[32]  Alan Yuille,et al.  DualNorm-UNet: Incorporating Global and Local Statistics for Robust Medical Image Segmentation , 2021, ArXiv.

[33]  Xiping Hu,et al.  More than Encoder: Introducing Transformer Decoder to Upsample , 2021, ArXiv.

[34]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[35]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[36]  Guangming Lu,et al.  TransAttUnet: Multi-level Attention-guided U-Net with Transformer for Medical Image Segmentation , 2021, ArXiv.

[37]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[38]  A. Gholipour,et al.  Convolution-Free Medical Image Segmentation using Transformers , 2021, International Conference on Medical Image Computing and Computer-Assisted Intervention.

[39]  Guangtao Zhai,et al.  Transclaw U-Net: Claw U-Net With Transformers for Medical Image Segmentation , 2021, 2022 5th International Conference on Information Communication and Signal Processing (ICICSP).

[40]  Vishal M. Patel,et al.  Medical Transformer: Gated Axial-Attention for Medical Image Segmentation , 2021, MICCAI.

[41]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.