INVE: Interactive Neural Video Editing

We present Interactive Neural Video Editing (INVE), a real-time video editing solution, which can assist the video editing process by consistently propagating sparse frame edits to the entire video clip. Our method is inspired by the recent work on Layered Neural Atlas (LNA). LNA, however, suffers from two major drawbacks: (1) the method is too slow for interactive editing, and (2) it offers insufficient support for some editing use cases, including direct frame editing and rigid texture tracking. To address these challenges we leverage and adopt highly efficient network architectures, powered by hash-grids encoding, to substantially improve processing speed. In addition, we learn bi-directional functions between image-atlas and introduce vectorized editing, which collectively enables a much greater variety of edits in both the atlas and the frames directly. Compared to LNA, our INVE reduces the learning and inference time by a factor of 5, and supports various video editing operations that LNA cannot. We showcase the superiority of INVE over LNA in interactive video editing through a comprehensive quantitative and qualitative analysis, highlighting its numerous advantages and improved performance. For video results, please see https://gabriel-huang.github.io/inve/

[1]  Jia-Bin Huang,et al.  Temporally Consistent Semantic Video Editing , 2022, ECCV.

[2]  Long Mai,et al.  Motion-Adjustable Neural Implicit Video Representation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Pratul P. Srinivasan,et al.  Block-NeRF: Scalable Large Scene Neural View Synthesis , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  U. Neumann,et al.  Point-NeRF: Point-based Neural Radiance Fields , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[6]  L. Sigal,et al.  Layered Controllable Video Generation , 2021, ECCV.

[7]  Tali Dekel,et al.  Layered neural atlases for consistent video editing , 2021, ACM Trans. Graph..

[8]  Thomas Müller,et al.  Real-time neural radiance caching for path tracing , 2021, ACM Trans. Graph..

[9]  Pratul P. Srinivasan,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Yee Whye Teh,et al.  COIN: COmpression with Implicit Neural representations , 2021, ICLR 2021.

[11]  Xiaolong Wang,et al.  Learning Continuous Image Representation with Local Implicit Image Function , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Victor Lempitsky,et al.  Image Generators with Conditionally-Independent Pixel Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Mohamed Elhoseiny,et al.  Adversarial Generation of Continuous Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Alexei A. Efros,et al.  Space-Time Correspondence as a Contrastive Random Walk , 2020, NeurIPS.

[15]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[16]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[17]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[18]  Eli Shechtman,et al.  Stylizing video by example , 2019, ACM Trans. Graph..

[19]  Ning Xu,et al.  Video Object Segmentation Using Space-Time Memory Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Allan Jabri,et al.  Learning Correspondence From the Cycle-Consistency of Time , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Paul Siebert,et al.  Removal of Visual Disruption Caused by Rain Using Cycle-Consistent Generative Adversarial Networks , 2018, ECCV Workshops.

[22]  Jan Kautz,et al.  Video-to-Video Synthesis , 2018, NeurIPS.

[23]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[25]  Nenghai Yu,et al.  Coherent Online Video Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[27]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Narendra Ahuja,et al.  Temporally coherent completion of dynamic video , 2016, ACM Trans. Graph..

[29]  Aaron C. Courville,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Andrew W. Fitzgibbon,et al.  Unwrap mosaics: a new representation for video editing , 2008, ACM Trans. Graph..