CiaoSR: Continuous Implicit Attention-in-Attention Network for Arbitrary-Scale Image Super-Resolution

Learning continuous image representations is recently gaining popularity for image super-resolution (SR) because of its ability to reconstruct high-resolution images with arbitrary scales from low-resolution inputs. Existing methods mostly ensemble nearby features to predict the new pixel at any queried coordinate in the SR image. Such a local ensemble suffers from some limitations: i) it has no learnable parameters and it neglects the similarity of the visual features; ii) it has a limited receptive field and cannot ensemble relevant features in a large field which are important in an image. To address these issues, this paper proposes a continuous implicit attention-in-attention network, called CiaoSR. We explicitly design an implicit attention network to learn the ensemble weights for the nearby local features. Furthermore, we embed a scale-aware attention in this implicit attention network to exploit additional non-local information. Extensive experiments on benchmark datasets demonstrate CiaoSR significantly outperforms the existing single image SR methods with the same backbone. In addition, CiaoSR also achieves the state-of-the-art performance on the arbitrary-scale SR task. The effectiveness of the method is also demonstrated on the real-world SR setting. More importantly, CiaoSR can be flexibly integrated into any backbone to improve the SR performance.

[1]  Shuhang Gu,et al.  MFAGAN: A Compression Framework for Memory-Efficient On-Device Super-Resolution GAN , 2021, SSRN Electronic Journal.

[2]  Yulun Zhang,et al.  Reference-based Image Super-Resolution with Deformable Attention Transformer , 2022, ECCV.

[3]  L. Gool,et al.  Towards Interpretable Video Super-Resolution via Alternating Optimization , 2022, ECCV.

[4]  Mingkui Tan,et al.  Towards Lightweight Super-Resolution with Dual Regression Learning , 2022, ArXiv.

[5]  L. Gool,et al.  Recurrent Video Restoration Transformer with Guided Deformable Attention , 2022, NeurIPS.

[6]  Chao Dong,et al.  Activating More Pixels in Image Super-Resolution Transformer , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Humphrey Shi,et al.  Neighborhood Attention Transformer , 2022, ArXiv.

[8]  L. Gool,et al.  VRT: A Video Restoration Transformer , 2022, IEEE Transactions on Image Processing.

[9]  K. Jin,et al.  Local Texture Estimator for Implicit Representation Function , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Zhengjun Zha,et al.  A Model-Driven Deep Unfolding Method for JPEG Artifacts Removal , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Luc Van Gool,et al.  Plug-and-Play Image Restoration With Deep Denoiser Prior , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  L. Gool,et al.  Practical Real Video Denoising with Realistic Degradation Model , 2022, ArXiv.

[13]  Huanjing Yue,et al.  Implicit Transformer Network for Screen Content Image Continuous Super-Resolution , 2021, NeurIPS.

[14]  Luc Van Gool,et al.  SwinIR: Image Restoration Using Swin Transformer , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[15]  Luc Van Gool,et al.  Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Luc Van Gool,et al.  Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Ying Shan,et al.  Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[18]  L. Gool,et al.  Video Super-Resolution Transformer , 2021, ArXiv.

[19]  Yuchen Fan,et al.  Image Super-Resolution with Non-Local Sparse Attention , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kyoung Mu Lee,et al.  SRWarp: Generalized Image Super-Resolution under Arbitrary Transformation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Wei An,et al.  Unsupervised Degradation Representation Learning for Blind Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Luc Van Gool,et al.  Flow-based Kernel Prior with Application to Blind Super-Resolution , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Luc Van Gool,et al.  Designing a Practical Degradation Model for Deep Blind Image Super-Resolution , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Zhangyang Wang,et al.  UltraSR: Spatial Encoding is a Missing Key for Implicit Image Function-based Arbitrary-Scale Super-Resolution , 2021, ArXiv.

[25]  Yiping Duan,et al.  Deep Coupled Feedback Network for Joint Exposure Fusion and Image Super-Resolution , 2021, IEEE Transactions on Image Processing.

[26]  Xiaolong Wang,et al.  Learning Continuous Image Representation with Local Implicit Image Function , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Wen Gao,et al.  Pre-Trained Image Processing Transformer , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yulan Guo,et al.  Learning A Single Network for Scale-Arbitrary Super-Resolution , 2020, IEEE International Conference on Computer Vision.

[29]  Shuhang Gu,et al.  Unsupervised Real-world Image Super Resolution via Domain-distance Aware Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yun Fu,et al.  Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Xiaochun Cao,et al.  Correction to: Single Image Super-Resolution via a Holistic Attention Network , 2020, ECCV.

[32]  Qi Tian,et al.  Video Super-Resolution with Recurrent Structure-Detail Network , 2020, ECCV.

[33]  Gordon Wetzstein,et al.  Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[34]  Wangmeng Zuo,et al.  Cross-Scale Internal Graph Neural Network for Image Super-Resolution , 2020, NeurIPS.

[35]  Nam Ik Cho,et al.  A Pseudo-Blind Convolutional Neural Network for the Reduction of Compression Artifacts , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[36]  Richard A. Newcombe,et al.  Deep Local Shapes: Learning Local SDF Priors for Detailed 3D Reconstruction , 2020, ECCV.

[37]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[39]  Marc Pollefeys,et al.  Convolutional Occupancy Networks , 2020, ECCV.

[40]  Y. Lipman,et al.  Implicit Geometric Regularization for Learning Shapes , 2020, ICML.

[41]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Y. Lipman,et al.  SAL: Sign Agnostic Learning of Shapes From Raw Data , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Anders P. Eriksson,et al.  Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Feng Wu,et al.  JPEG Artifacts Reduction via Deep Convolutional Sparse Coding , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[46]  Xiangchu Feng,et al.  FOCNet: A Fractional Optimal Control Network for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Shu-Tao Xia,et al.  Second-Order Attention Network for Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Yu Zhang,et al.  Dilated Residual Networks with Symmetric Skip Connection for image denoising , 2019, Neurocomputing.

[49]  Andreas Geiger,et al.  Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Lei Zhang,et al.  Toward Real-World Single Image Super-Resolution: A New Benchmark and a New Model , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Wei An,et al.  Learning Parallax Attention for Stereo Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Tieniu Tan,et al.  Meta-SR: A Magnification-Arbitrary Network for Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[54]  Hao Zhang,et al.  Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Radu Timofte,et al.  2018 PIRM Challenge on Perceptual Image Super-resolution , 2018, ArXiv.

[56]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[57]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[58]  Thomas S. Huang,et al.  Non-Local Recurrent Network for Image Restoration , 2018, NeurIPS.

[59]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[61]  Wangmeng Zuo,et al.  Learning a Single Convolutional Super-Resolution Network for Multiple Degradations , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[63]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[64]  Eirikur Agustsson,et al.  NTIRE 2017 Challenge on Single Image Super-Resolution: Dataset and Study , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[65]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[66]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[67]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Luc Van Gool,et al.  DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[69]  Luca Benini,et al.  CAS-CNN: A deep convolutional neural network for image compression artifact suppression , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[70]  Yunjin Chen,et al.  Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[73]  Kiyoharu Aizawa,et al.  Sketch-based manga retrieval using manga109 dataset , 2015, Multimedia Tools and Applications.

[74]  Mohinder Malhotra Single Image Haze Removal Using Dark Channel Prior , 2016 .

[75]  Narendra Ahuja,et al.  Single image super-resolution from transformed self-exemplars , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Sumohana S. Channappayya,et al.  Blind image quality evaluation using perception based features , 2015, 2015 Twenty First National Conference on Communications (NCC).

[77]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[78]  Luc Van Gool,et al.  A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution , 2014, ACCV.

[79]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[80]  Luc Van Gool,et al.  Anchored Neighborhood Regression for Fast Example-Based Super-Resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[81]  Daniel Rueckert,et al.  Cardiac Image Super-Resolution with Global Correspondence Using Multi-Atlas PatchMatch , 2013, MICCAI.

[82]  Alan C. Bovik,et al.  Making a “Completely Blind” Image Quality Analyzer , 2013, IEEE Signal Processing Letters.

[83]  Nong Sang,et al.  Fast image super resolution via local regression , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[84]  Aline Roumy,et al.  Low-Complexity Single-Image Super-Resolution based on Nonnegative Neighbor Embedding , 2012, BMVC.

[85]  Alan C. Bovik,et al.  Blind/Referenceless Image Spatial Quality Evaluator , 2011, 2011 Conference Record of the Forty Fifth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[86]  Pong C. Yuen,et al.  Very low resolution face recognition problem , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[87]  Michael Elad,et al.  On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[88]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[89]  Yücel Altunbasak,et al.  Super-resolution reconstruction of compressed video using transform-domain statistics , 2004, IEEE Transactions on Image Processing.

[90]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.