PoSNet: 4x Video Frame Interpolation Using Position-Specific Flow

Video frame interpolation has been studied for a long time; however, it is still a difficult low-level vision task. Owing to the improved performance of optical flow estimation, frame-interpolation studies based on optical flow are actively conducted. However, the existing methods are generally tested using high-fps sequences and developed for 2× upscaling or generating multiple frames with a single estimator. This paper proposes a 4× video-interpolation framework that aims to convert 15-fps to 60-fps videos based on a structure comprising flow estimation followed by an enhancement network. We improve the performance by training specialized flow estimators for each direction and frame position. Furthermore, we use the original frames and flow maps as additional inputs for the enhancement network to improve the subjective image quality. Consequently, the proposed network interpolates high-quality frames with a fast runtime and demonstrates its superiority in the AIM 2019 video temporal super-resolution challenge. The associated code is available at https://github.com/SonghyunYu/PoSNet.

[1]  Feng Liu,et al.  Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[4]  Michael T. Orchard,et al.  Overlapped block motion compensation , 1992, Other Conferences.

[5]  Chau Yuen,et al.  Modeling and Optimization of High Frame Rate Video Transmission Over Wireless Networks , 2016, IEEE Transactions on Wireless Communications.

[6]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[8]  Xin Huang,et al.  Cross-band noise model refinement for transform domain Wyner-Ziv video coding , 2012, Signal Process. Image Commun..

[9]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Michael J. Black,et al.  Optical Flow Estimation Using a Spatial Pyramid Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[12]  Rae-Hong Park,et al.  Coarse-to-fine frame interpolation for frame rate up-conversion using pyramid structure , 2003, IEEE Trans. Consumer Electron..

[13]  Vladislav Samsonov Deep Frame Interpolation , 2017, ArXiv.

[14]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[16]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jechang Jeong,et al.  Deep Iterative Down-Up CNN for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[20]  Cordelia Schmid,et al.  DeepFlow: Large Displacement Optical Flow with Deep Matching , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Sung-Jea Ko,et al.  New frame rate up-conversion using bi-directional motion estimation , 2000, IEEE Trans. Consumer Electron..

[22]  Joachim Weickert,et al.  Motion Compensated Frame Interpolation with a Symmetric Optical Flow Constraint , 2012, ISVC.

[23]  Tomer Peleg,et al.  IM-Net for High Resolution Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Alain Trémeau,et al.  Residual Conv-Deconv Grid Network for Semantic Segmentation , 2017, BMVC.

[25]  Marcus A. Magnor,et al.  View and Time Interpolation in Image Space , 2008, Comput. Graph. Forum.

[26]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Shahram Shirani,et al.  Frame Rate Upconversion Using Optical Flow and Patch-Based Reconstruction , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Bohyung Han,et al.  AIM 2019 Challenge on Video Temporal Super-Resolution: Methods and Results , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[30]  Kyoung Mu Lee,et al.  Enhanced Deep Residual Networks for Single Image Super-Resolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[31]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Gregory Shakhnarovich,et al.  Recurrent Back-Projection Network for Video Super-Resolution , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  Jechang Jeong,et al.  Densely Connected Hierarchical Network for Image Denoising , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Jian Yang,et al.  MemNet: A Persistent Memory Network for Image Restoration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.