Memory-Augmented Non-Local Attention for Video Super-Resolution

In this paper, we propose a novel video super-resolution method that aims at generating high-fidelity high-resolution (HR) videos from low-resolution (LR) ones. Previous methods predominantly leverage temporal neighbor frames to assist the super-resolution of the current frame. Those methods achieve limited performance as they suffer from the challenge in spatial frame alignment and the lack of useful information from similar LR neighbor frames. In contrast, we devise a cross-frame non-local attention mechanism that allows video super-resolution without frame alignment, leading to be more robust to large motions in the video. In addition, to acquire the information beyond neighbor frames, we design a novel memory-augmented attention module to memorize general video details during the superresolution training. Experimental results indicate that our method can achieve superior performance on large motion videos comparing to the state-of-the-art methods without aligning frames. Our source code will be released.

[1]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[2]  Chao Dong,et al.  Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[5]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[6]  Yi Yang,et al.  Inflated Episodic Memory With Region Self-Attention for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kyoung Mu Lee,et al.  Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Hongyang Chao,et al.  Learning Joint Spatial-Temporal Transformations for Video Inpainting , 2020, ECCV.

[10]  Andrew Zisserman,et al.  Memory-augmented Dense Predictive Coding for Video Representation Learning , 2020, ECCV.

[11]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Yun Fu,et al.  Residual Dense Network for Image Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[15]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[16]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[17]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Luc Van Gool,et al.  Anchored Neighborhood Regression for Fast Example-Based Super-Resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[19]  Zhe L. Lin,et al.  Fast Image Super-Resolution Based on In-Place Example Regression , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Chenliang Xu,et al.  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Tong Tong,et al.  Image Super-Resolution Using Dense Skip Connections , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Pascal Poupart,et al.  Progressive Memory Banks for Incremental Domain Adaptation , 2018, ICLR.

[23]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[24]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bernhard Schölkopf,et al.  EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Jian Yang,et al.  Image Super-Resolution via Deep Recursive Residual Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Hairong Qi,et al.  Image Super-Resolution by Neural Texture Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Thomas S. Huang,et al.  Image Super-Resolution With Cross-Scale Non-Local Attention and Exhaustive Self-Exemplars Mining , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[32]  Ashish Vaswani,et al.  Stand-Alone Self-Attention in Vision Models , 2019, NeurIPS.

[33]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Junjun Jiang,et al.  Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[38]  Shanxin Yuan,et al.  Video Super-Resolution With Temporal Group Attention , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xianming Liu,et al.  Robust Video Super-Resolution with Learned Temporal Dynamics , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[40]  Renjie Liao,et al.  Detail-Revealing Deep Video Super-Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  Xiangyu Xu,et al.  GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Baining Guo,et al.  Learning Texture Transformer Network for Image Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Seoung Wug Oh,et al.  Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Matthew A. Brown,et al.  Frame-Recurrent Video Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Chen Change Loy,et al.  BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).