SwinIR: Image Restoration Using Swin Transformer

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14∼0.45dB, while the total number of parameters can be reduced by up to 67%.

[1]  Xiangchu Feng,et al.  FOCNet: A Fractional Optimal Control Network for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kurt Keutzer,et al.  Visual Transformers: Token-based Image Representation and Processing for Computer Vision , 2020, ArXiv.

[3]  Nong Sang,et al.  Fast image super resolution via local regression , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[4]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[5]  Michal Irani,et al.  Nonparametric Blind Super-resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Shuhang Gu,et al.  MFAGAN: A Compression Framework for Memory-Efficient On-Device Super-Resolution GAN , 2021, SSRN Electronic Journal.

[8]  Kun Zhou,et al.  LAPAR: Linearly-Assembled Pixel-Adaptive Regression Network for Single Image Super-resolution and Beyond , 2021, NeurIPS.

[9]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[10]  Jonathon Shlens,et al.  Revisiting Spatial Invariance with Low-Rank Local Connectivity , 2020, ICML.

[11]  Luc Van Gool,et al.  LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.

[12]  Yuchen Fan,et al.  Image Super-Resolution with Non-Local Sparse Attention , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[14]  Mingkui Tan,et al.  Closed-Loop Matters: Dual Regression Networks for Single Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Martin Jaggi,et al.  On the Relationship between Self-Attention and Convolutional Layers , 2019, ICLR.

[16]  Kiyoharu Aizawa,et al.  Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[19]  Jianmin Bao,et al.  Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Qi Tian,et al.  Video Super-Resolution with Recurrent Structure-Detail Network , 2020, ECCV.

[21]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[22]  Yulan Guo,et al.  Learning A Single Network for Scale-Arbitrary Super-Resolution , 2020, IEEE International Conference on Computer Vision.

[23]  Gregory Shakhnarovich,et al.  Deep Back-Projection Networks for Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Luc Van Gool,et al.  Boosting Crowd Counting with Transformers , 2021, ArXiv.

[25]  Wangmeng Zuo,et al.  Cross-Scale Internal Graph Neural Network for Image Super-Resolution , 2020, NeurIPS.

[26]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Lei Zhang,et al.  Color demosaicking by local directional interpolation and nonlocal adaptive thresholding , 2011, J. Electronic Imaging.

[28]  Zhengjun Zha,et al.  A Model-Driven Deep Unfolding Method for JPEG Artifacts Removal , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[31]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[32]  Liang Lin,et al.  Multi-level Wavelet-CNN for Image Restoration , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Yu Zhang,et al.  Dilated Residual Networks with Symmetric Skip Connection for image denoising , 2019, Neurocomputing.

[34]  Peisong Wang,et al.  ODE-Inspired Network Design for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Karen O. Egiazarian,et al.  Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images , 2007, IEEE Transactions on Image Processing.

[37]  L. Gool,et al.  Transformer in Convolutional Neural Networks , 2021, ArXiv.

[38]  Luc Van Gool,et al.  Designing a Practical Degradation Model for Deep Blind Image Super-Resolution , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Shuhang Gu,et al.  Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Luca Benini,et al.  CAS-CNN: A deep convolutional neural network for image compression artifact suppression , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[41]  Xiaochun Cao,et al.  Correction to: Single Image Super-Resolution via a Holistic Attention Network , 2020, ECCV.

[42]  Luc Van Gool,et al.  Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Matti Pietikäinen,et al.  Deep Learning for Generic Object Detection: A Survey , 2018, International Journal of Computer Vision.

[44]  Wei An,et al.  Learning Parallax Attention for Stereo Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  TransCrowd: Weakly-Supervised Crowd Counting with Transformer , 2021, ArXiv.

[46]  Trevor Darrell,et al.  Early Convolutions Help Transformers See Better , 2021, NeurIPS.

[47]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[48]  Luc Van Gool,et al.  Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  L. Davis,et al.  Quantization Guided JPEG Artifact Correction , 2020, ECCV.

[50]  Radu Timofte,et al.  Frequency Separation for Real-World Super-Resolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[51]  Michel Barlaud,et al.  Two deterministic half-quadratic regularization algorithms for computed imaging , 1994, Proceedings of 1st International Conference on Image Processing.

[52]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jonathon Shlens,et al.  Scaling Local Self-Attention for Parameter Efficient Visual Backbones , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[55]  Wei Wu,et al.  Feedback Network for Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Feng Wu,et al.  JPEG Artifacts Reduction via Deep Convolutional Sparse Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[58]  Yun Fu,et al.  Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[60]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[61]  Eirikur Agustsson,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[62]  Nam Ik Cho,et al.  A Pseudo-Blind Convolutional Neural Network for the Reduction of Compression Artifacts , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  Fritsche Manuel,et al.  Frequency Separation for Real-World Super-Resolution , 2019 .

[64]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[65]  Wangmeng Zuo,et al.  Learning a Single Convolutional Super-Resolution Network for Multiple Degradations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[66]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Chunwei Tian,et al.  Image denoising using deep CNN with batch renormalization , 2020, Neural Networks.

[68]  Yanyun Qu,et al.  LatticeNet: Towards Lightweight Image Super-Resolution with Lattice Block , 2020, ECCV.

[69]  Mohinder Malhotra Single Image Haze Removal Using Dark Channel Prior , 2016 .

[70]  Kyung-Ah Sohn,et al.  Fast, Accurate, and, Lightweight Super-Resolution with Cascading Residual Network , 2018, ECCV.

[71]  Feiyue Huang,et al.  Real-World Super-Resolution via Kernel Estimation and Noise Injection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[72]  Aline Roumy,et al.  Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding , 2012, BMVC.

[73]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[74]  Luc Van Gool,et al.  A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution , 2014, ACCV.

[75]  Ying Shan,et al.  Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[76]  Thomas S. Huang,et al.  Non-Local Recurrent Network for Image Restoration , 2018, NeurIPS.

[77]  Luc Van Gool,et al.  Flow-based Kernel Prior with Application to Blind Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Xinbo Gao,et al.  Lightweight Image Super-Resolution with Information Multi-distillation Network , 2019, ACM Multimedia.

[79]  Stefan Roth,et al.  Neural Nearest Neighbors Networks , 2018, NeurIPS.

[80]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[81]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Luc Van Gool,et al.  Plug-and-Play Image Restoration With Deep Denoiser Prior , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[83]  Yiping Duan,et al.  Deep Coupled Feedback Network for Joint Exposure Fusion and Image Super-Resolution , 2021, IEEE Transactions on Image Processing.

[84]  Luc Van Gool,et al.  Video Super-Resolution Transformer , 2021, ArXiv.

[85]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[86]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[88]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[89]  Lei Zhang,et al.  Weighted Nuclear Norm Minimization with Application to Image Denoising , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[90]  Wei An,et al.  Unsupervised Degradation Representation Learning for Blind Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[91]  Qi Tian,et al.  Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation , 2021, ECCV Workshops.

[92]  Yunjin Chen,et al.  Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Zhihao Xia,et al.  Identifying Recurring Patterns with Deep Neural Networks for Natural Image Denoising , 2018, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[94]  Luc Van Gool,et al.  Anchored Neighborhood Regression for Fast Example-Based Super-Resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[95]  Shu-Tao Xia,et al.  Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).