Deformable Kernel Convolutional Network for Video Extreme Super-Resolution

Video super-resolution, which attempts to reconstruct high-resolution video frames from their corresponding low-resolution versions, has received increasingly more attention in recent years. Most existing approaches opt to use deformable convolution to temporally align neighboring frames and apply traditional spatial attention mechanism (convolution based) to enhance reconstructed features. However, such spatial-only strategies cannot fully utilize temporal dependency among video frames. In this paper, we propose a novel deep learning based VSR algorithm, named Deformable Kernel Spatial Attention Network (DKSAN). Thanks to newly designed Deformable Kernel Convolution Alignment (DKC_Align) and Deformable Kernel Spatial Attention (DKSA) modules, DKSAN can better exploit both spatial and temporal redundancies to facilitate the information propagation across different layers. We have tested DKSAN on AIM2020 Video Extreme Super-Resolution Challenge to super-resolve videos with a scale factor as large as 16. Experimental results demonstrate that our proposed DKSAN can achieve both better subjective and objective performance compared with the existing state-of-the-art EDVR on Vid3oC and IntVID datasets.

[1]  Michael Elad,et al.  Fast and robust multiframe super resolution , 2004, IEEE Transactions on Image Processing.

[2]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[3]  Raquel Urtasun,et al.  Understanding the Effective Receptive Field in Deep Convolutional Neural Networks , 2016, NIPS.

[4]  Enhua Wu,et al.  Handling motion blur in multi-frame super-resolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[7]  Deqing Sun,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 on Bayesian Adaptive Video Super Resolution , 2022 .

[8]  Dan Xia,et al.  AIM 2020 Challenge on Video Extreme Super-Resolution: Methods and Results , 2020, ECCV Workshops.

[9]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[10]  오승준 [서평]「Digital Video Processing」 , 1996 .

[11]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[12]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[13]  Chenliang Xu,et al.  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hua Wang,et al.  Deformable Non-Local Network for Video Super-Resolution , 2019, IEEE Access.

[15]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[16]  Xizhou Zhu,et al.  Deformable Kernels: Adapting Effective Receptive Fields for Object Deformation , 2020, ICLR.

[17]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[18]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jianbo Shi,et al.  Object Detection in Video with Spatiotemporal Sampling Networks , 2018, ECCV.

[20]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[22]  Yanfang Ye,et al.  Joint Demosaicing and Super-Resolution (JDSR): Network Design and Perceptual Optimization , 2019, IEEE Transactions on Computational Imaging.

[23]  Renjie Liao,et al.  Detail-Revealing Deep Video Super-Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Norio Ito,et al.  HDR video super-resolution for future video coding , 2018, 2018 International Workshop on Advanced Image Technology (IWAIT).

[26]  Chao Ren,et al.  Video Super-Resolution via Residual Learning , 2018, IEEE Access.

[27]  Seoung Wug Oh,et al.  Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Shanxin Yuan,et al.  Video Super-Resolution With Temporal Group Attention , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Matthew A. Brown,et al.  Frame-Recurrent Video Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Xianming Liu,et al.  AIM 2019 Challenge on Video Extreme Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[32]  Renjie Liao,et al.  Video Super-Resolution via Deep Draft-Ensemble Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  Zhiwu Huang,et al.  The Vid3oC and IntVID Datasets for Video Super Resolution and Quality Mapping , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[34]  Xianming Liu,et al.  Robust Video Super-Resolution with Learned Temporal Dynamics , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  Yasutaka Matsuo,et al.  Super-resolution for 2K/8K television using wavelet-based image registration , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[36]  Siome Goldenstein,et al.  Eyes on the Target: Super-Resolution and License-Plate Recognition in Low-Quality Surveillance Videos , 2017, IEEE Access.

[37]  Xin Li,et al.  SCAN: Spatial Color Attention Networks for Real Single Image Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[38]  Leilei Zhu,et al.  Encoder-Decoder Residual Network for Real Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[39]  Jan P. Allebach,et al.  Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Gregory Shakhnarovich,et al.  Recurrent Back-Projection Network for Video Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).