论文信息 - Video Frame Interpolation via Cyclic Fine-Tuning and Asymmetric Reverse Flow

Video Frame Interpolation via Cyclic Fine-Tuning and Asymmetric Reverse Flow

The objective in video frame interpolation is to predict additional in-between frames in a video while retaining natural motion and good visual quality. In this work, we use a convolutional neural network (CNN) that takes two frames as input and predicts two optical flows with pixelwise weights. The flows are from an unknown in-between frame to the input frames. The input frames are warped with the predicted flows, multiplied by the predicted weights, and added to form the in-between frame. We also propose a new strategy to improve the performance of video frame interpolation models: we reconstruct the original frames using the learned model by reusing the predicted frames as input for the model. This is used during inference to fine-tune the model so that it predicts the best possible frames. Our model outperforms the publicly available state-of-the-art methods on multiple datasets.

[1] Roberto Castagno,et al. A method for motion adaptive frame rate up-conversion , 1996, IEEE Trans. Circuits Syst. Video Technol..

[2] Evan Herbst,et al. Occlusion Reasoning for Temporal Interpolation using Optical Flow , 2009 .

[3] Feng Liu,et al. Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[5] John Lasseter,et al. Principles of traditional animation applied to 3D computer animation , 1987, SIGGRAPH.

[6] Edwin E. Catmull,et al. The problems of computer-assisted animation , 1978, SIGGRAPH.

[7] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[8] Michael J. Black,et al. Slow Flow: Exploiting High-Speed Cameras for Accurate and Diverse Optical Flow Reference Data , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Richard Szeliski,et al. Prediction error as a quality metric for motion and stereo , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[10] Richard Szeliski,et al. A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11] Jan Kautz,et al. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12] Max Grosse,et al. Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Markus H. Gross,et al. PhaseNet for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Hongdong Li,et al. Learning Image Matching by Simply Watching Video , 2016, ECCV.

[16] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[17] Feng Liu,et al. Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18] Yung-Yu Chuang,et al. Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[19] Li Fei-Fei,et al. Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[20] Feng Liu,et al. Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[22] William T. Reeves,et al. Inbetweening for computer animation utilizing moving point constraints , 1981, SIGGRAPH '81.

[23] Horst Bischof,et al. Optical Flow Guided TV-L1 Video Interpolation and Restoration , 2011, EMMCVPR.

[24] Xiaoou Tang,et al. Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25] Wojciech Matusik,et al. Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..