Time Lens: Event-based Video Frame Interpolation

State-of-the-art frame interpolation methods generate intermediate frames by inferring object motions in the image from consecutive key-frames. In the absence of additional information, first-order approximations, i.e. optical flow, must be used, but this choice restricts the types of motions that can be modeled, leading to errors in highly dynamic scenarios. Event cameras are novel sensors that address this limitation by providing auxiliary visual information in the blind-time between frames. They asynchronously measure per-pixel brightness changes and do this with high temporal resolution and low latency. Event-based frame interpolation methods typically adopt a synthesis-based approach, where predicted frame residuals are directly applied to the key-frames. However, while these approaches can capture non-linear motions they suffer from ghosting and perform poorly in low-texture regions with few events. Thus, synthesis-based and flow-based approaches are complementary. In this work, we introduce Time Lens, a novel method that leverages the advantages of both. We extensively evaluate our method on three synthetic and two real benchmarks where we show an up to 5.21 dB improvement in terms of PSNR over state-of-the-art frame-based and event-based methods. Finally, we release a new large-scale dataset in highly dynamic scenarios, aimed at pushing the limits of existing methods.

[1]  Jan Kautz,et al.  PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Vladlen Koltun,et al.  High Speed and High Dynamic Range Video with an Event Camera , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Oliver Cossairt,et al.  Joint Filtering of Intensity Images and Neuromorphic Events for High-Resolution Noise-Robust Imaging , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Zheng Liu,et al.  All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling , 2020, ECCV.

[5]  Davide Scaramuzza,et al.  End-to-End Learning of Representations for Asynchronous Event-Based Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Dongqing Zou,et al.  Learning Event-Based Motion Deblurring , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Thomas Brox,et al.  FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Chang-Su Kim,et al.  BMBC: Bilateral Motion Estimation with Bilateral Cost Volume for Video Interpolation , 2020, ECCV.

[10]  Vladlen Koltun,et al.  Events-To-Video: Bringing Modern Computer Vision to Event Cameras , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  T. Delbruck,et al.  A 128 128 120 dB 15 s Latency Asynchronous Temporal Contrast Vision Sensor , 2006 .

[12]  Tobi Delbruck,et al.  A 240 × 180 130 dB 3 µs Latency Global Shutter Spatiotemporal Vision Sensor , 2014, IEEE Journal of Solid-State Circuits.

[13]  Tae Hyun Kim,et al.  Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Chao-Yuan Wu,et al.  Video Compression through Image Interpolation , 2018, ECCV.

[15]  Kostas Daniilidis,et al.  Unsupervised Event-Based Optical Flow Using Motion Compensation , 2018, ECCV Workshops.

[16]  Haopeng Li,et al.  Video Frame Interpolation Via Residue Refinement , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Roland Siegwart,et al.  Rolling Shutter Camera Calibration , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jing Chen,et al.  Learning Event-Driven Video Deblurring and Interpolation , 2020, ECCV.

[19]  Richard Szeliski,et al.  A Database and Evaluation Methodology for Optical Flow , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Separable Convolution , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Michael Cohen,et al.  Enhancing and experiencing spacetime resolution with videos and stills , 2009, 2009 IEEE International Conference on Computational Photography (ICCP).

[22]  Qiong Yan,et al.  Cascade Residual Learning: A Two-Stage Convolutional Neural Network for Stereo Matching , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[23]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[24]  Qian Yin,et al.  Quadratic video interpolation , 2019, NeurIPS.

[25]  R. Mahony,et al.  Reducing the Sim-to-Real Gap for Event Cameras , 2020, ECCV.

[26]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[27]  Nick Barnes,et al.  Fast Image Reconstruction with an Event Camera , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[28]  Nima Khademi Kalantari,et al.  Deep Slow Motion Video Reconstruction With Hybrid Imaging System , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Jan Kautz,et al.  Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Markus H. Gross,et al.  PhaseNet for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Gui-Song Xia,et al.  Event Enhanced High-Quality Image Recovery , 2020, ECCV.

[32]  F. Paredes-Vall'es,et al.  Back to Event Basics: Self-Supervised Learning of Image Reconstruction for Event Cameras via Photometric Constancy , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Davide Scaramuzza,et al.  Video to Events: Recycling Video Datasets for Event Cameras , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Feng Liu,et al.  Context-Aware Synthesis for Video Frame Interpolation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Davide Scaramuzza,et al.  ESIM: an Open Event Camera Simulator , 2018, CoRL.

[36]  Xin Yu,et al.  Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jiajun Wu,et al.  Video Enhancement with Task-Oriented Flow , 2018, International Journal of Computer Vision.

[38]  Xiaoou Tang,et al.  Video Frame Synthesis Using Deep Voxel Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[40]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[41]  Aggelos K. Katsaggelos,et al.  Event-Driven Video Frame Synthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[42]  Feng Liu,et al.  Video Frame Interpolation via Adaptive Convolution , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Feng Liu,et al.  Softmax Splatting for Video Frame Interpolation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).