Learning Temporal Dynamics for Video Super-Resolution: A Deep Learning Approach

Video super-resolution (SR) aims at estimating a high-resolution video sequence from a low-resolution (LR) one. Given that the deep learning has been successfully applied to the task of single image SR, which demonstrates the strong capability of neural networks for modeling spatial relation within one single image, the key challenge to conduct video SR is how to efficiently and effectively exploit the temporal dependence among consecutive LR frames other than the spatial relation. However, this remains challenging because the complex motion is difficult to model and can bring detrimental effects if not handled properly. We tackle the problem of learning temporal dynamics from two aspects. First, we propose a temporal adaptive neural network that can adaptively determine the optimal scale of temporal dependence. Inspired by the inception module in GoogLeNet [1], filters of various temporal scales are applied to the input LR sequence before their responses are adaptively aggregated, in order to fully exploit the temporal relation among the consecutive LR frames. Second, we decrease the complexity of motion among neighboring frames using a spatial alignment network that can be end-to-end trained with the temporal adaptive network and has the merit of increasing the robustness to complex motion and the efficiency compared with the competing image alignment methods. We provide a comprehensive evaluation of the temporal adaptation and the spatial alignment modules. We show that the temporal adaptive design considerably improves the SR quality over its plain counterparts, and the spatial alignment network is able to attain comparable SR performance with the sophisticated optical flow-based approach, but requires a much less running time. Overall, our proposed model with learned temporal dynamics is shown to achieve the state-of-the-art SR results in terms of not only spatial consistency but also the temporal coherence on public video data sets. More information can be found in http://www.ifp.illinois.edu/~dingliu2/videoSR/.

[1]  Michael Elad,et al.  Fast and robust multiframe super resolution , 2004, IEEE Transactions on Image Processing.

[2]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[3]  Enhua Wu,et al.  Handling motion blur in multi-frame super-resolution , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Kyoung Mu Lee,et al.  Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Renjie Liao,et al.  Video Super-Resolution via Deep Draft-Ensemble Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Xiaoou Tang,et al.  Accelerating the Super-Resolution Convolutional Neural Network , 2016, ECCV.

[7]  Luc Van Gool,et al.  Is image super-resolution helpful for other vision tasks? , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Alan C. Bovik,et al.  Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[9]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[10]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[11]  Xianming Liu,et al.  Robust Video Super-Resolution with Learned Temporal Dynamics , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Thomas Martinetz,et al.  Variability of eye movements when viewing dynamic natural scenes. , 2010, Journal of vision.

[14]  Liang Wang,et al.  Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution , 2015, NIPS.

[15]  Thomas S. Huang,et al.  Studying Very Low Resolution Recognition Using Deep Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Renjie Liao,et al.  Detail-Revealing Deep Video Super-Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[18]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jian Yang,et al.  Image Super-Resolution via Deep Recursive Residual Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  C.-C. Jay Kuo,et al.  MCL-V: A streaming video quality assessment database , 2015, J. Vis. Commun. Image Represent..

[22]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[23]  Nikolas P. Galatsanos,et al.  Maximum a Posteriori Video Super-Resolution Using a New Multichannel Image Prior , 2010, IEEE Transactions on Image Processing.

[24]  Thomas S. Huang,et al.  Learning a Mixture of Deep Networks for Single Image Super-Resolution , 2016, ACCV.

[25]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Michael Elad,et al.  Super-Resolution Without Explicit Subpixel Motion Estimation , 2009, IEEE Transactions on Image Processing.

[27]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Christian Keimel,et al.  Visual quality of current coding technologies at high definition IPTV bitrates , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[29]  Thomas S. Huang,et al.  Deep Networks for Image Super-Resolution with Sparse Prior , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Aggelos K. Katsaggelos,et al.  Sparse Representation-Based Multiple Frame Video Super-Resolution , 2017, IEEE Transactions on Image Processing.

[32]  Thomas S. Huang,et al.  Image Super-Resolution via Dual-State Recurrent Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Rajiv Soundararajan,et al.  Study of Subjective and Objective Quality Assessment of Video , 2010, IEEE Transactions on Image Processing.

[34]  Ce Liu,et al.  Exploring new representations and applications for motion analysis , 2009 .

[35]  Shuicheng Yan,et al.  Video super-resolution based on spatial-temporal recurrent residual networks , 2017, Comput. Vis. Image Underst..

[36]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.