Zooming SlowMo: An Efficient One-Stage Framework for Space-Time Video Super-Resolution

In this paper, we address the space-time video super-resolution, which aims at generating a high-resolution (HR) slow-motion video from a low-resolution (LR) and low frame rate (LFR) video sequence. A naive method is to decompose it into two sub-tasks: video frame interpolation (VFI) and video super-resolution (VSR). Nevertheless, temporal interpolation and spatial upscaling are intra-related in this problem. Two-stage approaches cannot fully make use of this natural property. Besides, state-of-the-art VFI or VSR deep networks usually have a large frame reconstruction module in order to obtain high-quality photo-realistic video frames, which makes the two-stage approaches have large models and thus be relatively time-consuming. To overcome the issues, we present a one-stage space-time video super-resolution framework, which can directly reconstruct an HR slow-motion video sequence from an input LR and LFR video. Instead of reconstructing missing LR intermediate frames as VFI models do, we temporally interpolate LR frame features of the missing LR frames capturing local temporal contexts by a feature temporal interpolation module. Extensive experiments on widely used benchmarks demonstrate that the proposed framework not only achieves better qualitative and quantitative performance on both clean and noisy LR frames but also is several times faster than recent state-of-the-art two-stage networks. The source code is released in https:// github.com/ Mukosame/ Zooming-Slow-Mo-CVPR-2020.

[1]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[2]  Norimichi Ukita,et al.  Space-Time-Aware Multi-Resolution Video Enhancement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jing Chen,et al.  Learning Event-Driven Video Deblurring and Interpolation , 2020, ECCV.

[4]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Qing Ling,et al.  D3: Deep Dual-Domain Based Fast Restoration of JPEG-Compressed Images , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[7]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[8]  Peyman Milanfar,et al.  Spatiotemporal Video Upscaling Using Motion-Assisted Steering Kernel (MASK) Regression , 2010 .

[9]  Wangmeng Zuo,et al.  Learning Deep CNN Denoiser Prior for Image Restoration , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yaron Caspi,et al.  Increasing Space-Time Resolution in Video , 2002, ECCV.

[11]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[13]  Subhashis Banerjee,et al.  Space-Time Super-Resolution Using Graph-Cut Optimization , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Wenbo Li,et al.  MuCAN: Multi-Correspondence Aggregation Network for Video Super-Resolution , 2020, ECCV.

[16]  H. Sebastian Seung,et al.  Natural Image Denoising with Convolutional Networks , 2008, NIPS.

[17]  Michal Irani,et al.  Space-time super-resolution from a single video , 2011, CVPR 2011.

[18]  Wei An,et al.  Learning for Video Super-Resolution through HR Optical Flow Estimation , 2018, ACCV.

[19]  Michal Irani,et al.  Across Scales \& Across Dimensions: Temporal Super-Resolution using Deep Internal Learning , 2020, ECCV.

[20]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jan P. Allebach,et al.  Zooming Slow-Mo: Fast and Accurate One-Stage Space-Time Video Super-Resolution , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Wei Wang,et al.  Video Super-Resolution via Bidirectional Recurrent Convolutional Networks , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Soo Ye Kim,et al.  FISR: Deep Joint Frame Interpolation and Super-Resolution with A Multi-scale Temporal Loss , 2020, AAAI.

[25]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[26]  Chang-Su Kim,et al.  BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation , 2020, ECCV.

[27]  Jan P. Allebach,et al.  Boosting High-Level Vision with Joint Compression Artifacts Reduction and Super-Resolution , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[28]  Kyoung Mu Lee,et al.  Scene-Adaptive Video Frame Interpolation via Meta-Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Deqing Sun,et al.  A Bayesian approach to adaptive video super resolution , 2011, CVPR 2011.

[30]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Chenliang Xu,et al.  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[33]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Lei Zhang,et al.  FFDNet: Toward a Fast and Flexible Solution for CNN-Based Image Denoising , 2017, IEEE Transactions on Image Processing.

[35]  Gregory Shakhnarovich,et al.  Recurrent Back-Projection Network for Video Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Daniel Rueckert,et al.  Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[39]  Hongyang Chao,et al.  Building Dual-Domain Representations for Compression Artifacts Reduction , 2016, ECCV.

[40]  Christian Ledig,et al.  Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Zhiwei Xiong,et al.  Space-Time Video Super-Resolution Using Temporal Profiles , 2020, ACM Multimedia.

[42]  Guillermo Sapiro,et al.  Deep Video Deblurring for Hand-Held Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[44]  Yu-Bin Yang,et al.  Image Restoration Using Very Deep Convolutional Encoder-Decoder Networks with Symmetric Skip Connections , 2016, NIPS.

[45]  Xin Yu,et al.  Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[47]  Seoung Wug Oh,et al.  Deep Space-Time Video Upsampling Networks , 2020, ECCV.

[48]  Bohyung Han,et al.  Channel Attention Is All You Need for Video Frame Interpolation , 2020, AAAI.

[49]  Jan Kautz,et al.  Unsupervised Video Interpolation Using Cycle Consistency , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[51]  Feng Liu,et al.  Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Yun Fu,et al.  Residual Non-local Attention Networks for Image Restoration , 2019, ICLR.

[53]  Yung-Yu Chuang,et al.  Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[54]  Qi Tian,et al.  Video Super-Resolution with Recurrent Structure-Detail Network , 2020, ECCV.

[55]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[57]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[58]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Feng Liu,et al.  Softmax Splatting for Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Kyoung Mu Lee,et al.  Deep recurrent resnet for video super-resolution , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[61]  Narendra Ahuja,et al.  Deep Laplacian Pyramid Networks for Fast and Accurate Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Seoung Wug Oh,et al.  Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[64]  Dinesh Rajan,et al.  Space-time super-resolution from multiple-videos , 2012, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[65]  Taeoh Kim,et al.  AdaCoF: Adaptive Collaboration of Flows for Video Frame Interpolation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Xiaoou Tang,et al.  Compression Artifacts Reduction by a Deep Convolutional Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[67]  Li Chen,et al.  Blurry Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Chao Ren,et al.  Space-time super-resolution with patch group cuts prior , 2015, Signal Process. Image Commun..

[69]  Stephen Lin,et al.  Deformable ConvNets V2: More Deformable, Better Results , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.