论文信息 - Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation

Convolutional neural networks have enabled accurate image super-resolution in real-time. However, recent attempts to benefit from temporal correlations in video super-resolution have been limited to naive or inefficient architectures. In this paper, we introduce spatio-temporal sub-pixel convolution networks that effectively exploit temporal redundancies and improve reconstruction accuracy while maintaining real-time speed. Specifically, we discuss the use of early fusion, slow fusion and 3D convolutions for the joint processing of multiple consecutive video frames. We also propose a novel joint motion compensation and video super-resolution algorithm that is orders of magnitude more efficient than competing methods, relying on a fast multi-resolution spatial transformer module that is end-to-end trainable. These contributions provide both higher accuracy and temporally more consistent videos, which we confirm qualitatively and quantitatively. Relative to single-frame models, spatio-temporal networks can either reduce the computational cost by 30% whilst maintaining the same quality or provide a 0.2dB gain for a similar computational cost. Results on publicly available datasets demonstrate that the proposed algorithms surpass current state-of-the-art performance in both accuracy and efficiency.

[1] Viorica Patraucean,et al. Spatio-temporal video autoencoder with differentiable memory , 2015, ArXiv.

[2] Michal Irani,et al. Super-resolution from a single image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Surya Ganguli,et al. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks , 2013, ICLR.

[5] Alan C. Bovik,et al. No-Reference Image Quality Assessment in the Spatial Domain , 2012, IEEE Transactions on Image Processing.

[6] Viorica Patraucean,et al. gvnn: Neural Network Library for Geometric Computer Vision , 2016, ECCV Workshops.

[7] Thomas S. Huang,et al. Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[8] Xiaoou Tang,et al. Accelerating the Super-Resolution Convolutional Neural Network , 2016, ECCV.

[9] Aggelos K. Katsaggelos,et al. Video Super-Resolution With Convolutional Neural Networks , 2016, IEEE Transactions on Computational Imaging.

[10] Andrew Zisserman,et al. Spatial Transformer Networks , 2015, NIPS.

[11] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12] Thomas Brox,et al. Generating Images with Perceptual Similarity Metrics based on Deep Networks , 2016, NIPS.

[13] Jian Sun,et al. Identity Mappings in Deep Residual Networks , 2016, ECCV.

[14] Joan Bruna,et al. Super-Resolution with Deep Convolutional Sufficient Statistics , 2015, ICLR.

[15] Christian Ledig,et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Ting-Chun Wang,et al. Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[17] Kyoung Mu Lee,et al. Deeply-Recursive Convolutional Network for Image Super-Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Deqing Sun,et al. A Bayesian approach to adaptive video super resolution , 2011, CVPR 2011.

[19] Andreas Krause,et al. Advances in Neural Information Processing Systems (NIPS) , 2014 .

[20] Antonio Torralba,et al. Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21] Moon Gi Kang,et al. Super-resolution image reconstruction: a technical overview , 2003, IEEE Signal Process. Mag..

[22] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[23] Daniel Rueckert,et al. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Thomas Brox,et al. High Accuracy Optical Flow Estimation Based on a Theory for Warping , 2004, ECCV.

[25] Ioannis Patras,et al. Unsupervised convolutional neural networks for motion estimation , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[26] Michael Elad,et al. Super-Resolution Without Explicit Subpixel Motion Estimation , 2009, IEEE Transactions on Image Processing.

[27] Xiaoou Tang,et al. Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Victor S. Lempitsky,et al. DeepWarp: Photorealistic Image Resynthesis for Gaze Manipulation , 2016, ECCV.

[29] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[30] Vivienne Sze,et al. FAST: Free Adaptive Super-Resolution via Transfer for Compressed Videos , 2016, ArXiv.

[31] Xiaoou Tang,et al. Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[32] Antonio Torralba,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[33] Yann LeCun,et al. Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[34] Liang Wang,et al. Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution , 2015, NIPS.

[35] Michael Elad,et al. On Single Image Scale-Up Using Sparse-Representations , 2010, Curves and Surfaces.

[36] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[37] Claire Cardie,et al. Constrained K-means Clustering with Background Knowledge , 2001, ICML.

[38] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[39] Peyman Milanfar,et al. Kernel Regression for Image Processing and Reconstruction , 2007, IEEE Transactions on Image Processing.

[40] Thomas Brox,et al. FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Yücel Altunbasak,et al. Eigenface-domain super-resolution for face recognition , 2003, IEEE Trans. Image Process..

[43] Gholamreza Anbarjafari,et al. Discrete Wavelet Transform-Based Satellite Image Resolution Enhancement , 2011, IEEE Transactions on Geoscience and Remote Sensing.

[44] Alan C. Bovik,et al. Motion Tuned Spatio-Temporal Quality Assessment of Natural Videos , 2010, IEEE Transactions on Image Processing.

[45] Daniel Rueckert,et al. Cardiac Image Super-Resolution with Global Correspondence Using Multi-Atlas PatchMatch , 2013, MICCAI.

[46] Thomas S. Huang,et al. Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[47] Xiaoou Tang,et al. Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[48] Y A Bukhshtab,et al. Digital Video Library. , 2000 .

[49] Aggelos K. Katsaggelos,et al. Dictionary-based multiple frame video super-resolution , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[50] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[51] Kwang In Kim,et al. Example-Based Learning for Single-Image Super-Resolution , 2008, DAGM-Symposium.

[52] Martial Hebert,et al. Model recommendation for action recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[53] Tapani Raiko,et al. International Conference on Learning Representations (ICLR) , 2016 .

[54] Lorenzo Torresani,et al. Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[55] Michal Irani,et al. Space-time super-resolution from a single video , 2011, CVPR 2011.