Deep recurrent resnet for video super-resolution

In recent years, the performance of video super-resolution has improved significantly with the help of convolutional neural networks (CNN). Most recent works based on CNN use optical flow to handle video frames. They first compensate the motion and perform multi-frame super-resolution based on aligned frames. However, this 2-step approach has the drawback that the first step can be a bottleneck in overall performance. In this paper, we present a different approach to solving video super-resolution problem without any use of optical flow or motion compensation. We adopt recent advances in a recurrent neural network called long-short-term memory (LSTM) and residual network to deal with consecutive video frames effectively. Compared to the single-frame method, our recurrent model gives superior performance and shows more temporally coherent results.

[1]  Thomas Brox,et al.  A Unified Video Segmentation Benchmark: Annotation, Metrics and Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[2]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Aggelos K. Katsaggelos,et al.  Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[5]  Zengfu Wang,et al.  Video Superresolution via Motion Compensation and Deep Residual Learning , 2017, IEEE Transactions on Computational Imaging.

[6]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Renjie Liao,et al.  Video Super-Resolution via Deep Draft-Ensemble Learning , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Christian Ledig,et al.  Is the deconvolution layer the same as a convolutional layer? , 2016, ArXiv.

[10]  Liang Wang,et al.  Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution , 2015, NIPS.

[11]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[13]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Thomas Brox,et al.  End-to-End Learning of Video Super-Resolution with Motion Compensation , 2017, GCPR.

[16]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[17]  Deqing Sun,et al.  A Bayesian approach to adaptive video super resolution , 2011, CVPR 2011.