Transclaw U-Net: Claw U-Net With Transformers for Medical Image Segmentation

In recent years, computer-aided diagnosis has become an increasingly popular topic. Methods based on convolutional neural networks have achieved good performance in medical image segmentation and classification. Due to the limitations of the convolution operation, the long-term spatial features are often not accurately obtained. Hence, we propose a TransClaw U-Net network structure, which combines the convolution operation with the transformer operation in the encoding part. The convolution part is applied for extracting the shallow spatial features to facilitate the recovery of the image resolution after upsampling. The transformer part is used to encode the patches, and the self-attention mechanism is used to obtain global information between sequences. The decoding part retains the bottom upsampling structure for better detail segmentation performance. The experimental results on Synapse Multi-organ Segmentation Datasets show that the performance of TransClaw U-Net is better than other network structures. The ablation experiments also prove the generalization performance of TransClaw U-Net.

[1]  Daguang Xu,et al.  UNETR: Transformers for 3D Medical Image Segmentation , 2021, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[2]  Yan Wang,et al.  A Fixed-Point Model for Pancreas Segmentation in Abdominal CT Scans , 2016, MICCAI.

[3]  Chi-Wing Fu,et al.  H-DenseUNet: Hybrid Densely Connected UNet for Liver and Tumor Segmentation From CT Volumes , 2018, IEEE Transactions on Medical Imaging.

[4]  Yue Wu,et al.  Claw U-Net: A Unet-based Network with Deep Feature Concatenation for Scleral Blood Vessel Segmentation , 2020, ArXiv.

[5]  W. Eric L. Grimson,et al.  A shape-based approach to the segmentation of medical imagery using level sets , 2003, IEEE Transactions on Medical Imaging.

[6]  Pheng-Ann Heng,et al.  Channel-Unet: A Spatial Channel-Wise Convolutional Neural Network for Liver and Tumors Segmentation , 2019, Front. Genet..

[7]  Zhiming Luo,et al.  Weighted Res-UNet for High-Quality Retina Vessel Segmentation , 2018, 2018 9th International Conference on Information Technology in Medicine and Education (ITME).

[8]  Hao Chen,et al.  Automatic 3D Cardiovascular MR Segmentation with Densely-Connected Volumetric ConvNets , 2017, MICCAI.

[9]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[10]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[11]  Seyed-Ahmad Ahmadi,et al.  V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[12]  Yichen Wei,et al.  Relation Networks for Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[15]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yan Wang,et al.  TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation , 2021, ArXiv.

[17]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Shenghua Gao,et al.  CE-Net: Context Encoder Network for 2D Medical Image Segmentation , 2019, IEEE Transactions on Medical Imaging.

[19]  Loïc Le Folgoc,et al.  Attention U-Net: Learning Where to Look for the Pancreas , 2018, ArXiv.

[20]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ben Glocker,et al.  Attention Gated Networks: Learning to Leverage Salient Regions in Medical Images , 2018, Medical Image Anal..

[22]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Vijayan K. Asari,et al.  Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation , 2018, ArXiv.

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  Nima Tajbakhsh,et al.  UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation , 2020, IEEE Transactions on Medical Imaging.

[26]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Yan Wang,et al.  Domain Adaptive Relational Reasoning for 3D Multi-Organ Segmentation , 2020, MICCAI.

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.