Efficient and Explicit Modelling of Image Hierarchies for Image Restoration

The aim of this paper is to propose a mechanism to efficiently and explicitly model image hierarchies in the global, regional, and local range for image restoration. To achieve that, we start by analyzing two important properties of natural images including cross-scale similarity and anisotropic image features. Inspired by that, we propose the anchored stripe self-attention which achieves a good balance between the space and time complexity of self-attention and the modelling capacity beyond the regional range. Then we propose a new network architecture dubbed GRL to explicitly model image hierarchies in the Global, Regional, and Local range via anchored stripe self-attention, window self-attention, and channel attention enhanced convolution. Finally, the proposed network is applied to 7 image restoration types, covering both real and synthetic settings. The proposed method sets the new state-of-the-art for several of those. Code will be available at https://github.com/ofsoundof/GRL-Image-Restoration.git.

[1]  Chao Dong,et al.  Activating More Pixels in Image Super-Resolution Transformer , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Chia-Wen Lin,et al.  Stripformer: Strip Transformer for Fast Image Deblurring , 2022, ECCV.

[3]  Sunghyun Cho,et al.  MSSNet: Multi-Scale-Stage Network for Single Image Deblurring , 2022, ECCV Workshops.

[4]  P. Milanfar,et al.  MAXIM: Multi-Axis MLP for Image Processing , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jiaya Jia,et al.  On Efficient Transformer-Based Image Pre-training for Low-Level Vision , 2021, IJCAI.

[6]  Syed Waqas Zamir,et al.  Restormer: Efficient Transformer for High-Resolution Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Li Dong,et al.  Swin Transformer V2: Scaling Up Capacity and Resolution , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Radu Timofte,et al.  Towards Flexible Blind JPEG Artifacts Removal , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[10]  Seungyong Lee,et al.  Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Vishnu Naresh Boddeti,et al.  Spatially-Adaptive Image Restoration using Distortion-Guided Networks , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Sung-Jea Ko,et al.  Rethinking Coarse-to-Fine Approach in Single Image Deblurring , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Jianmin Bao,et al.  Uformer: A General U-Shaped Transformer for Image Restoration , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Seungyong Lee,et al.  Iterative Filter Adaptive Network for Single Image Defocus Deblurring , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yuchen Fan,et al.  Image Super-Resolution with Non-Local Sparse Attention , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Luc Van Gool,et al.  LocalViT: Bringing Locality to Vision Transformers , 2021, ArXiv.

[17]  Xiang Li,et al.  Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Ling Shao,et al.  Multi-Stage Progressive Image Restoration , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yen-Yu Lin,et al.  BANet: A Blur-Aware Attention Network for Dynamic Scene Deblurring , 2021, IEEE Transactions on Image Processing.

[20]  Matthieu Cord,et al.  Training data-efficient image transformers & distillation through attention , 2020, ICML.

[21]  M. S. Brown,et al.  Learning to Reduce Defocus Blur by Realistically Modeling Dual-Pixel Data , 2020, IEEE International Conference on Computer Vision.

[22]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Chao Dong,et al.  Interpreting Super-Resolution Networks with Local Attribution Maps , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[25]  Bin Li,et al.  Deformable DETR: Deformable Transformers for End-to-End Object Detection , 2020, ICLR.

[26]  Lucy J. Colwell,et al.  Rethinking Attention with Performers , 2020, ICLR.

[27]  J. Morel,et al.  Residual Learning for Effective joint Demosaicing-Denoising , 2020, ArXiv.

[28]  Luc Van Gool,et al.  Plug-and-Play Image Restoration With Deep Denoiser Prior , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Xiaochun Cao,et al.  Correction to: Single Image Super-Resolution via a Holistic Attention Network , 2020, ECCV.

[30]  Sunghyun Cho,et al.  Real-World Blur Dataset for Learning and Benchmarking Deblurring Algorithms , 2020, ECCV.

[31]  Nikolaos Pappas,et al.  Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention , 2020, ICML.

[32]  Han Fang,et al.  Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[33]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[34]  M. S. Brown,et al.  Defocus Deblurring Using Dual-Pixel Data , 2020, ECCV.

[35]  Siddhartha Gairola,et al.  SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation , 2020, IJCAI.

[36]  L. Davis,et al.  Quantization Guided JPEG Artifact Correction , 2020, ECCV.

[37]  A. N. Rajagopalan,et al.  Spatially-Attentive Patch-Hierarchical Network for Adaptive Motion Deblurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  B. Stenger,et al.  Deblurring by Realistic Blurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Lukasz Kaiser,et al.  Reformer: The Efficient Transformer , 2020, ICLR.

[40]  Se Young Chun,et al.  Multi-Temporal Recurrent Neural Networks For Progressive Non-Uniform Single Image Deblurring With Incremental Temporal Training , 2019, ECCV.

[41]  Ling Shao,et al.  Human-Aware Motion Deblurring , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42]  Feng Wu,et al.  JPEG Artifacts Reduction via Deep Convolutional Sparse Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Zhangyang Wang,et al.  DeblurGAN-v2: Deblurring (Orders-of-Magnitude) Faster and Better , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Xiaoyong Shen,et al.  Dynamic Scene Deblurring With Parameter Selective Sharing and Nested Skip Connections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Sungkil Lee,et al.  Deep Defocus Map Estimation Using Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Shu-Tao Xia,et al.  Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ilya Sutskever,et al.  Generating Long Sequences with Sparse Transformers , 2019, ArXiv.

[48]  Hongdong Li,et al.  Deep Stacked Hierarchical Multi-Patch Network for Image Deblurring , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[50]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[51]  Stamatios Lefkimmiatis,et al.  Iterative Joint Image Demosaicking and Denoising Using a Residual Denoising Network , 2018, IEEE Transactions on Image Processing.

[52]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[53]  Thomas S. Huang,et al.  Non-Local Recurrent Network for Image Restoration , 2018, NeurIPS.

[54]  Jinhui Tang,et al.  Single Image Dehazing via Conditional Generative Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[55]  Liang Lin,et al.  Multi-level Wavelet-CNN for Image Restoration , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[56]  Chao Dong,et al.  Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Gregory Shakhnarovich,et al.  Deep Back-Projection Networks for Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[58]  Claudio Rosito Jung,et al.  Edge-Based Defocus Blur Estimation With Adaptive Scale Selection , 2018, IEEE Transactions on Image Processing.

[59]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Dustin Tran,et al.  Image Transformer , 2018, ICML.

[61]  Yi Wang,et al.  Scale-Recurrent Network for Deep Image Deblurring , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Jiri Matas,et al.  DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[65]  Tong Tong,et al.  Image Super-Resolution Using Dense Skip Connections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[66]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[67]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[68]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Deborah Estrin,et al.  Collaborative Metric Learning , 2017, WWW.

[70]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Frédo Durand,et al.  Deep joint demosaicking and denoising , 2016, ACM Trans. Graph..

[72]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[73]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Radu Timofte,et al.  Demosaicing Based on Directional Difference Regression and Efficient Regression Priors , 2016, IEEE Transactions on Image Processing.

[75]  Masatoshi Okutomi,et al.  Beyond Color Difference: Residual Interpolation for Color Image Demosaicking , 2016, IEEE Transactions on Image Processing.

[76]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Kiyoharu Aizawa,et al.  Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[78]  Narendra Ahuja,et al.  Single image super-resolution from transformed self-exemplars , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Jianping Shi,et al.  Just noticeable defocus blur detection and estimation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[80]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[81]  Aline Roumy,et al.  Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding , 2012, BMVC.

[82]  Lei Zhang,et al.  Color demosaicking by local directional interpolation and nonlocal adaptive thresholding , 2011, J. Electronic Imaging.

[83]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[84]  Alessandro Foi,et al.  Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering , 2007, IEEE Transactions on Image Processing.

[85]  Karen O. Egiazarian,et al.  Pointwise Shape-Adaptive DCT for High-Quality Denoising and Deblocking of Grayscale and Color Images , 2007, IEEE Transactions on Image Processing.

[86]  Jean-Michel Morel,et al.  A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[87]  Wei-Ying Ma,et al.  Multi-model similarity propagation and its application for web image retrieval , 2004, MULTIMEDIA '04.

[88]  William T. Freeman,et al.  Example-Based Super-Resolution , 2002, IEEE Computer Graphics and Applications.

[89]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[90]  Stephen Lin,et al.  Swin Transformer: Hierarchical Vision Transformer using Shifted Windows , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[91]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.