Video Frame Interpolation Via Residue Refinement

Video frame interpolation achieves temporal super-resolution by generating smooth transitions between frames. Although great success has been achieved by deep neural networks, the synthesized images stills suffer from poor visual appearance and unsatisfactory artifacts. In this paper, we propose a novel network structure that leverages residue refinement and adaptive weight to synthesize in-between frames. The residue refinement technique is used for optical flow and image generation for higher accuracy and better visual appearance, while the adaptive weight map combines the forward and backward warped frames to reduce the artifacts. Moreover, all submodules in our method are implemented by U-Net with less depths, so the efficiency is guaranteed. Experiments on public datasets demonstrate the effectiveness and superiority of our method over the state-of-the-art approaches.

[1]  C.-C. Jay Kuo,et al.  Low complexity algorithm for robust video frame rate up-conversion (FRUC) technique , 2009, IEEE Transactions on Consumer Electronics.

[2]  Lianghui Ding,et al.  High-Order Model and Dynamic Filtering for Frame Rate Up-Conversion , 2018, IEEE Transactions on Image Processing.

[3]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[4]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[5]  Haopeng Li,et al.  FI-Net: A Lightweight Video Frame Interpolation Network Using Feature-Level Flow , 2019, IEEE Access.

[6]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[7]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[11]  Kenny Mitchell,et al.  Photo-Realistic Facial Details Synthesis From Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[20]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).