FeatureFlow: Robust Video Interpolation via Structure-to-Texture Generation

Video interpolation aims to synthesize non-existent frames between two consecutive frames. Although existing optical flow based methods have achieved promising results, they still face great challenges in dealing with the interpolation of complicated dynamic scenes, which include occlusion, blur or abrupt brightness change. This is mainly because these cases may break the basic assumptions of the optical flow estimation (i.e. smoothness, consistency). In this work, we devised a novel structure-to-texture generation framework which splits the video interpolation task into two stages: structure-guided interpolation and texture refinement. In the first stage, deep structure-aware features are employed to predict feature flows from two consecutive frames to their intermediate result, and further generate the structure image of the intermediate frame. In the second stage, based on the generated coarse result, a Frame Texture Compensator is trained to fill in detailed textures. To the best of our knowledge, this is the first work that attempts to directly generate the intermediate frame through blending deep features. Experiments on both the benchmark datasets and challenging occlusion cases demonstrate the superiority of the proposed framework over the state-of-the-art methods. Codes are available on https://github.com/CM-BF/FeatureFlow.

[1]  Jianbo Shi,et al.  Object Detection in Video with Spatiotemporal Sampling Networks , 2018, ECCV.

[2]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[3]  Dacheng Tao,et al.  Attention-GAN for Object Transfiguration in Wild Images , 2018, ECCV.

[4]  Tomer Peleg,et al.  IM-Net for High Resolution Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Dacheng Tao,et al.  Learning from Multiple Teacher Networks , 2017, KDD.

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Chen Change Loy,et al.  EDVR: Video Restoration With Enhanced Deformable Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[8]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[9]  Roberto Castagno,et al.  A method for motion adaptive frame rate up-conversion , 1996, IEEE Trans. Circuits Syst. Video Technol..

[10]  Jinjun Xiong,et al.  Learning Motion in Feature Space: Locally-Consistent Deformable Convolution Networks for Fine-Grained Action Detection , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Dacheng Tao,et al.  Perceptual Adversarial Networks for Image-to-Image Transformation , 2017, IEEE Transactions on Image Processing.

[12]  Xudong Jiang,et al.  Deformable Pose Traversal Convolution for 3D Action and Gesture Recognition , 2018, ECCV.

[13]  Jian Yang,et al.  Occluded Pedestrian Detection Through Guided Attention in CNNs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Ming Yang,et al.  Bi-Directional Cascade Network for Perceptual Edge Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Horst Bischof,et al.  Optical Flow Guided TV-L1 Video Interpolation and Restoration , 2011, EMMCVPR.

[17]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[19]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Yun Fu,et al.  Image Super-Resolution Using Very Deep Residual Channel Attention Networks , 2018, ECCV.

[24]  Lianghui Ding,et al.  High-Order Model and Dynamic Filtering for Frame Rate Up-Conversion , 2018, IEEE Transactions on Image Processing.

[25]  Jeff Donahue,et al.  Large Scale Adversarial Representation Learning , 2019, NeurIPS.

[26]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[27]  Dacheng Tao,et al.  Tag Disentangled Generative Adversarial Network for Object Image Re-rendering , 2017, IJCAI.

[28]  Yi Li,et al.  Deformable Convolutional Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Guillermo Sapiro,et al.  Deep Video Deblurring for Hand-Held Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[31]  Xianming Liu,et al.  Robust Video Super-Resolution with Learned Temporal Dynamics , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Feng Liu,et al.  Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Chenliang Xu,et al.  TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Michel Barlaud,et al.  Two deterministic half-quadratic regularization algorithms for computed imaging , 1994, Proceedings of 1st International Conference on Image Processing.

[37]  Thomas S. Huang,et al.  Non-Local Recurrent Network for Image Restoration , 2018, NeurIPS.

[38]  Ming Yang,et al.  Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras , 2018, IEEE Transactions on Intelligent Transportation Systems.

[39]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Nenghai Yu,et al.  Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition , 2018, ECCV.

[42]  Xin Yao,et al.  Evolutionary Generative Adversarial Networks , 2018, IEEE Transactions on Evolutionary Computation.

[43]  Xiaokang Yang,et al.  Gated-GAN: Adversarial Gated Networks for Multi-Collection Style Transfer , 2019, IEEE Transactions on Image Processing.