TED-net: Convolution-free T2T Vision Transformer-based Encoder-decoder Dilation network for Low-dose CT Denoising

Low dose computed tomography is a mainstream for clinical applications. However, compared to normal dose CT, in the low dose CT (LDCT) images, there are stronger noise and more artifacts which are obstacles for practical applications. In the last few years, convolution-based end-to-end deep learning methods have been widely used for LDCT image denoising. Recently, transformer has shown superior performance over convolution with more feature interactions. Yet its applications in LDCT denoising have not been fully cultivated. Here, we propose a convolution-free T2T vision transformer-based Encoder-decoder Dilation network (TED-net) to enrich the family of LDCT denoising algorithms. The model is free of convolution blocks and consists of a symmetric encoder-decoder block with sole transformer. Our model is evaluated on the AAPM-Mayo clinic LDCT Grand Challenge dataset, and results show outperformance over the state-of-the-art denoising methods.

[1]  Bo Zhang,et al.  Do We Really Need Explicit Position Encodings for Vision Transformers? , 2021, ArXiv.

[2]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[3]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[4]  Jianfeng Gao,et al.  DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation , 2020, ACL.

[5]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[6]  Eun-Sol Kim,et al.  HOTR: End-to-End Human-Object Interaction Detection with Transformers , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[8]  Mark Chen,et al.  Generative Pretraining From Pixels , 2020, ICML.

[9]  Steve B. Jiang,et al.  Low-dose CT reconstruction via edge-preserving total variation regularization. , 2010, Physics in medicine and biology.

[10]  Jimmy J. Lin,et al.  DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference , 2020, ACL.

[11]  Lukasz Kaiser,et al.  Rethinking Attention with Performers , 2020, ArXiv.

[12]  Zhengrong Liang,et al.  Adaptive-weighted total variation minimization for sparse data toward low-dose x-ray computed tomography image reconstruction , 2012, Physics in medicine and biology.

[13]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Enhua Wu,et al.  Transformer in Transformer , 2021, NeurIPS.

[15]  Baiyu Chen,et al.  Low‐dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 Low Dose CT Grand Challenge , 2017, Medical physics.

[16]  Xuanqin Mou,et al.  Tensor-based dictionary learning for dynamic tomographic reconstruction , 2015, Physics in medicine and biology.

[17]  Qianjin Feng,et al.  Low-dose computed tomography image restoration using previous normal-dose scan. , 2011, Medical physics.

[18]  Uwe Kruger,et al.  Competitive performance of a modularized deep neural network compared to commercial algorithms for low-dose CT image reconstruction , 2019, Nat. Mach. Intell..

[19]  Lei Xing,et al.  TransCT: Dual-Path Transformer for Low Dose Computed Tomography , 2021, MICCAI.

[20]  Wangmeng Zuo,et al.  Attention-guided CNN for image denoising , 2020, Neural Networks.

[21]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  N. Codella,et al.  CvT: Introducing Convolutions to Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Lei Zhang,et al.  Low-Dose X-ray CT Reconstruction via Dictionary Learning , 2012, IEEE Transactions on Medical Imaging.

[24]  Hengyong Yu,et al.  Compressed sensing based interior tomography , 2009, Physics in medicine and biology.

[25]  Shuicheng Yan,et al.  Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet , 2021, ArXiv.

[26]  Xuanqin Mou,et al.  Low-Dose CT Image Denoising Using a Generative Adversarial Network With Wasserstein Distance and Perceptual Loss , 2017, IEEE Transactions on Medical Imaging.

[27]  Hongming Shan,et al.  Quadratic Autoencoder (Q-AE) for Low-Dose CT Denoising , 2019, IEEE Transactions on Medical Imaging.

[28]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[29]  Baining Guo,et al.  Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[31]  Huazhong Shu,et al.  Domain Progressive 3D Residual Convolution Network to Improve Low-Dose CT Imaging , 2019, IEEE Transactions on Medical Imaging.

[32]  Jiliu Zhou,et al.  Few-view image reconstruction with fractional-order total variation. , 2014, Journal of the Optical Society of America. A, Optics, image science, and vision.

[33]  E. Sidky,et al.  Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization , 2008, Physics in medicine and biology.

[34]  Feng Lin,et al.  Low-Dose CT With a Residual Encoder-Decoder Convolutional Neural Network , 2017, IEEE Transactions on Medical Imaging.