Space-Time Distillation for Video Super-Resolution

Compact video super-resolution (VSR) networks can be easily deployed on resource-limited devices, e.g., smartphones and wearable devices, but have considerable performance gaps compared with complicated VSR networks that require a large amount of computing resources. In this paper, we aim to improve the performance of compact VSR networks without changing their original architectures, through a knowledge distillation approach that transfers knowledge from a complicated VSR network to a compact one. Specifically, we propose a space-time distillation (STD) scheme to exploit both spatial and temporal knowledge in the VSR task. For space distillation, we extract spatial attention maps that hint the high-frequency video content from both networks, which are further used for transferring spatial modeling capabilities. For time distillation, we narrow the performance gap between compact models and complicated models by distilling the feature similarity of the temporal memory cells, which are encoded from the sequence of feature maps generated in the training clips using ConvLSTM. During the training process, STD can be easily incorporated into any network without changing the original network architecture. Experimental results on standard benchmarks demonstrate that, in resource-constrained situations, the proposed method notably improves the performance of existing VSR networks without increasing the inference time.

[1]  Julie Delon,et al.  DVDNET: A Fast Network for Deep Video Denoising , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[2]  Michal Irani,et al.  Across Scales \& Across Dimensions: Temporal Super-Resolution using Deep Internal Learning , 2020, ECCV.

[3]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[4]  Renjie Liao,et al.  Detail-Revealing Deep Video Super-Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Chenliang Xu,et al.  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Djemel Ziou,et al.  Image Quality Metrics: PSNR vs. SSIM , 2010, 2010 20th International Conference on Pattern Recognition.

[7]  Chen Change Loy,et al.  Learning Lightweight Lane Detection CNNs by Self Attention Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Zhiwei Xiong,et al.  Space-Time Video Super-Resolution Using Temporal Profiles , 2020, ACM Multimedia.

[10]  Tong Tong,et al.  Image Super-Resolution Using Knowledge Distillation , 2018, ACCV.

[11]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[12]  Junmo Kim,et al.  A Gift from Knowledge Distillation: Fast Optimization, Network Minimization and Transfer Learning , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[14]  Wei An,et al.  Learning for Video Super-Resolution through HR Optical Flow Estimation , 2018, ACCV.

[15]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Seoung Wug Oh,et al.  Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Qi Tian,et al.  Video Super-Resolution with Recurrent Structure-Detail Network , 2020, ECCV.

[18]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[19]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[20]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[21]  Jiebo Luo,et al.  Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Jan van Gemert,et al.  ViDeNN: Deep Blind Video Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  A. Murat Tekalp,et al.  NTIRE 2019 Challenge on Video Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Junjun Jiang,et al.  Progressive Fusion Video Super-Resolution Network via Exploiting Non-Local Spatio-Temporal Correlations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Matthew A. Brown,et al.  Frame-Recurrent Video Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  William T. Freeman,et al.  Learning Low-Level Vision , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[28]  Deqing Sun,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 on Bayesian Adaptive Video Super Resolution , 2022 .

[29]  Zhiwei Xiong,et al.  Two-Stream Action Recognition-Oriented Video Super-Resolution , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Weidong Sheng,et al.  Deformable 3D Convolution for Video Super-Resolution , 2020, IEEE Signal Processing Letters.

[31]  Radu Timofte,et al.  Efficient Video Super-Resolution through Recurrent Latent Space Propagation , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[32]  Bumsub Ham,et al.  Learning with Privileged Information for Efficient Image Super-Resolution , 2020, ECCV.

[33]  Feng Wu,et al.  Image hallucination with feature enhancement , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Shanxin Yuan,et al.  Video Super-Resolution With Temporal Group Attention , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xianming Liu,et al.  Robust Video Super-Resolution with Learned Temporal Dynamics , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Sangdoo Yun,et al.  A Comprehensive Overhaul of Feature Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[37]  Dong Xu,et al.  Example-Based Super-Resolution With Soft Information and Decision , 2013, IEEE Transactions on Multimedia.

[38]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[40]  Yoshua Bengio,et al.  FitNets: Hints for Thin Deep Nets , 2014, ICLR.

[41]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[42]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Chunhua Shen,et al.  Efficient Semantic Video Segmentation with Per-frame Inference , 2020, ECCV.

[44]  Julie Delon,et al.  FastDVDnet: Towards Real-Time Deep Video Denoising Without Flow Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Gregory Shakhnarovich,et al.  Recurrent Back-Projection Network for Video Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Zhiwei Xiong,et al.  Camera Lens Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Fahad Shahbaz Khan,et al.  NTIRE 2019 Challenge on Video Deblurring: Methods and Results , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[49]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).