Robust Temporal Super-Resolution for Dynamic Motion Videos

It is difficult to apply most video temporal super-resolution studies for real-world scenes because they are optimized for a specific range of characteristics. In this paper, we propose a video temporal super-resolution method that is tolerant to motion diversity and noise. Our proposed method improves its robustness by fine-tuning the pre-trained SPyNet that is trained for videos with simple motions and moderate conditions. Moreover, our proposed network learns to accurately synthesize two frames generated by a backward warping function without requiring any additional information using the architecture of a modified DHDN. This enables our proposed method to efficiently synthesize two warped frames by saving the computational complexity for pre-training and extracting the additional information. Finally, we apply the self-ensemble method, which is commonly used in studies on image processing but not on video processing. The application of the self-ensemble method enables our network to generate stable output frames with improved quality without any additional training. Our proposed network proved its performance by ranking 5th in the AIM 2019 video temporal super-resolution challenge; the performance gap between our proposed network and the 3rd-and 4th-ranked solutions was very small. The source code and pre-trained models are available at https://github.com/BumjunPark/DVTSR.

[1]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Alain Trémeau,et al.  Residual Conv-Deconv Grid Network for Semantic Segmentation , 2017, BMVC.

[4]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Kyoung-Rok Cho,et al.  Motion Compensated Frame Rate Up-Conversion Using Extended Bilateral Motion Estimation , 2007, IEEE Transactions on Consumer Electronics.

[6]  Feng Liu,et al.  Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Tomer Peleg,et al.  IM-Net for High Resolution Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Rongxin Jiang,et al.  Frame Interpolation Using Phase and Amplitude Feature Pyramids , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[9]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[10]  Djemel Ziou,et al.  Is there a relationship between peak-signal-to-noise ratio and structural similarity index measure? , 2013, IET Image Process..

[11]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[12]  Jechang Jeong,et al.  Deep Iterative Down-Up CNN for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[14]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[19]  Young Hwan Kim,et al.  Direction-Select Motion Estimation for Motion-Compensated Frame Rate Up-Conversion , 2013, Journal of Display Technology.

[20]  Jane de Almeida,et al.  Ultra High Definition , 2016 .

[21]  Haopeng Li,et al.  FI-Net: A Lightweight Video Frame Interpolation Network Using Feature-Level Flow , 2019, IEEE Access.

[22]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[23]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Markus H. Gross,et al.  PhaseNet for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[25]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Shahram Shirani,et al.  Frame Rate Upconversion Using Optical Flow and Patch-Based Reconstruction , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Bohyung Han,et al.  AIM 2019 Challenge on Video Temporal Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Chaoli Wang,et al.  TSR-TVD: Temporal Super-Resolution for Time-Varying Data Analysis and Visualization , 2020, IEEE Transactions on Visualization and Computer Graphics.

[31]  Andrew Gordon Wilson,et al.  Averaging Weights Leads to Wider Optima and Better Generalization , 2018, UAI.

[32]  Chi-Keung Tang,et al.  Deep High Dynamic Range Imaging with Large Foreground Motions , 2017, ECCV.

[33]  Jechang Jeong,et al.  Densely Connected Hierarchical Network for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[34]  Peter Pirsch,et al.  Array architectures for block matching algorithms , 1989 .

[35]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Jan Kautz,et al.  Loss Functions for Image Restoration With Neural Networks , 2017, IEEE Transactions on Computational Imaging.

[37]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yun Fu,et al.  Residual Dense Network for Image Restoration , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Jongyoo Kim,et al.  Video Frame Interpolation by Plug-and-Play Deep Locally Linear Embedding , 2018, ArXiv.

[40]  Luc Van Gool,et al.  Seven Ways to Improve Example-Based Single Image Super Resolution , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).