EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

In this paper, we consider the problem of reference-based video super-resolution(RefVSR), i.e., how to utilize a high-resolution (HR) reference frame to super-resolve a low-resolution (LR) video sequence. The existing approaches to RefVSR essentially attempt to align the reference and the input sequence, in the presence of resolution gap and long temporal range. However, they either ignore temporal structure within the input sequence, or suffer accumulative alignment errors. To address these issues, we propose EFENet to exploit simultaneously the visual cues contained in the HR reference and the temporal information contained in the LR sequence. EFENet first globally estimates cross-scale flow between the reference and each LR frame. Then our novel flow refinement module of EFENet refines the flow regarding the furthest frame using all the estimated flows, which leverages the global temporal information within the sequence and therefore effectively reduces the alignment errors. We provide comprehensive evaluations to validate the strengths of our approach, and to demonstrate that the proposed framework outperforms the state-of-the-art methods.

[1]  Jean-Michel Morel,et al.  Non-Local Means Denoising , 2011, Image Process. Line.

[2]  Qionghai Dai,et al.  Multiscale-VR: Multiscale Gigapixel 3D Panoramic Videography for Virtual Reality , 2020, 2020 IEEE International Conference on Computational Photography (ICCP).

[3]  Matthew A. Brown,et al.  Frame-Recurrent Video Super-Resolution , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Hairong Qi,et al.  Image Super-Resolution by Neural Texture Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Ashok Veeraraghavan,et al.  Improving resolution and depth-of-field of light field cameras using a hybrid imaging system , 2014, 2014 IEEE International Conference on Computational Photography (ICCP).

[6]  Luc Van Gool,et al.  Anchored Neighborhood Regression for Fast Example-Based Super-Resolution , 2013, 2013 IEEE International Conference on Computer Vision.

[7]  Bernhard Schölkopf,et al.  EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Qionghai Dai,et al.  Multiscale gigapixel video: A cross resolution image matching and warping approach , 2017, 2017 IEEE International Conference on Computational Photography (ICCP).

[9]  Bernt Schiele,et al.  2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[11]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[12]  Lu Fang,et al.  CrossNet: An End-to-end Reference-based Super Resolution Network using Cross-scale Warping , 2018, ECCV.

[13]  David J. Brady,et al.  CrossNet++: Cross-Scale Large-Parallax Warping for Reference-Based Super-Resolution , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Qionghai Dai,et al.  The Light Field Attachment: Turning a DSLR into a Light Field Camera Using a Low Budget Camera Ring , 2017, IEEE Trans. Vis. Comput. Graph..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  David J. Brady,et al.  Multiscale gigapixel photography , 2012, Nature.

[19]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Lu Fang,et al.  Learning Cross-scale Correspondence and Patch-based Synthesis for Reference-based Super-Resolution , 2017, BMVC.

[21]  Yongbing Zhang,et al.  A novel light field super-resolution framework based on hybrid imaging system , 2015, 2015 Visual Communications and Image Processing (VCIP).

[22]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[23]  Gregory Shakhnarovich,et al.  Recurrent Back-Projection Network for Video Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[25]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[27]  Lu Fang,et al.  Combining Exemplar-Based Approach and learning-Based Approach for Light Field Super-Resolution Using a Hybrid Imaging System , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[28]  Renjie Liao,et al.  Detail-Revealing Deep Video Super-Resolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).