Residual Learning of Video Frame Interpolation Using Convolutional LSTM

Video frame interpolation aims to generate intermediate frames between the original frames. This produces videos with a higher frame r ate and creates smoother motion. Many video frame interpolation methods first estimate the motion vector between the input frames and then synthesizes the intermediate frame based on the motion. However, these methods rely on the accuracy of the motion estimation step and fail to accurately generate the interpolated frame when the estimated motion vectors are inaccurate. Therefore, to avoid the uncertainties caused by motion estimation, this paper proposes a method that directly generates the intermediate frame. Since two consecutive frames are relatively similar, our method takes the average of these two frames and utilizes residual learning to learn the difference between the average of these frames and the ground truth middle frame. In addition, our method uses Convolutional LSTMs and four input frames to better incorporate spatiotemporal information. This neural network can be easily trained end to end without difficult to obtain data such as optical flow. Our experimental results show that the proposed method can perform favorably against other state-of-the-art frame interpolation methods.

[1]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[3]  Jitendra Malik,et al.  View Synthesis by Appearance Flow , 2016, ECCV.

[4]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[5]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Hongdong Li,et al.  Learning Image Matching by Simply Watching Video , 2016, ECCV.

[11]  Zhiyong Gao,et al.  MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14]  Suchendra M. Bhandarkar,et al.  DepthNet: A Recurrent Neural Network Architecture for Monocular Depth Prediction , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[15]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[20]  Yung-Yu Chuang,et al.  Deep Video Frame Interpolation Using Cyclic Frame Generation , 2019, AAAI.

[21]  Jan-Michael Frahm,et al.  Recurrent Neural Network for (Un-)Supervised Learning of Monocular Video Visual Odometry and Depth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[23]  Tammy Riklin-Raviv,et al.  Microscopy Cell Segmentation Via Convolutional LSTM Networks , 2018, 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019).