Optical Flow-Guided Multi-Scale Dense Network for Frame Interpolation

Video frame interpolation is a traditional computer vision task, which aims to generate intermediate frames between two given consecutive frames. Many algorithms attempt to solve this task relying on optical flow to compute dense pixel correspondence. According to the estimated flow, the input images are warped to the location of the interpolated frame, and then blended together to generate synthesis frame. However, due to the difficulty of flow estimation, this method always leads to blurry region and visually unpleasant results. To overcome the limitation of inaccurate flow estimation, we perform an end-to-end neural network to improve interpolation results after warping, which explicitly uses optical flow but not completely depends on it. Moreover, we design a multi-scale dense network for frame interpolation (FIMSDN), which not only makes full use of the multi-scale information for large motion frame interpolation, but also strengthens feature propagation. Specifically, a pre-trained optical flow net is firstly utilized to produce the bidirectional flow between two input frames. The input images are warped to the middle frame by the estimated flow and then fed with the original images into the FIMSDN to directly estimate the in-between frame. Experimental results show the improvement in terms of both objective and subjective quality by comparing with other recent optical flow and convolutional neural network (CNN) based methods.

[1]  Kilian Q. Weinberger,et al.  Multi-Scale Dense Networks for Resource Efficient Image Classification , 2017, ICLR.

[2]  Yao Zhao,et al.  Learning a Virtual Codec Based on Deep Convolutional Neural Network to Compress Image , 2017, J. Vis. Commun. Image Represent..

[3]  Yao Zhao,et al.  Simultaneously Color-Depth Super-Resolution with Conditional Generative Adversarial Network , 2017, ArXiv.

[4]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[6]  Jian Sun,et al.  Fast burst images denoising , 2014, ACM Trans. Graph..

[7]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Vladlen Koltun,et al.  Efficient Nonlocal Regularization for Optical Flow , 2012, ECCV.

[9]  Yao Zhao,et al.  Multiple Description Convolutional Neural Networks for Image Compression , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[14]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[16]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[18]  Yao Zhao,et al.  Simultaneous color-depth super-resolution with conditional generative adversarial networks , 2019, Pattern Recognit..

[19]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[20]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Max Grosse,et al.  Phase-based frame interpolation for video , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Cordelia Schmid,et al.  EpicFlow: Edge-preserving interpolation of correspondences for optical flow , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[25]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.