Flow Supervision for Deformable NeRF

In this paper we present a new method for deformable NeRF that can directly use optical flow as supervision. We overcome the major challenge with respect to the computationally inefficiency of enforcing the flow constraints to the backward deformation field, used by deformable NeRFs. Specifically, we show that inverting the backward deformation function is actually not needed for computing scene flows between frames. This insight dramatically simplifies the problem, as one is no longer constrained to deformation functions that can be analytically inverted. Instead, thanks to the weak assumptions required by our derivation based on the inverse function theorem, our approach can be extended to a broad class of commonly used backward deformation field. We present results on monocular novel view synthesis with rapid object motion, and demonstrate significant improvements over baselines without flow supervision.

[1]  Bryan C. Russell,et al.  Monocular Dynamic View Synthesis: A Reality Check , 2022, NeurIPS.

[2]  Ben Poole,et al.  DreamFusion: Text-to-3D using 2D Diffusion , 2022, ICLR.

[3]  Yan Wang,et al.  Neural Surface Reconstruction of Dynamic Scenes with Monocular RGB-D Camera , 2022, NeurIPS.

[4]  S. Lucey,et al.  Neural Prior for Trajectory Estimation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  A. Tagliasacchi,et al.  D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from a Monocular Video , 2022, 2205.15838.

[6]  Kostas Daniilidis,et al.  CaDeX: Learning Canonical Deformation Coordinate Space for Dynamic Surface Representation via Neural Homeomorphism , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  T. Müller,et al.  Instant neural graphics primitives with a multiresolution hash encoding , 2022, ACM Trans. Graph..

[8]  A. Vedaldi,et al.  BANMo: Building Animatable 3D Neural Models from Many Casual Videos , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Shuchang Zhou,et al.  RIFE: Real-Time Intermediate Flow Estimation for Video Frame Interpolation , 2020, ECCV.

[10]  Konrad Schindler,et al.  Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-Shot Cross-Dataset Transfer , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Simon Lucey,et al.  Neural Scene Flow Prior , 2021, NeurIPS.

[12]  W. Freeman,et al.  Consistent depth of moving objects in video , 2021, ACM Transactions on Graphics.

[13]  Jonathan T. Barron,et al.  HyperNeRF , 2021, ACM Trans. Graph..

[14]  Johannes Kopf,et al.  Dynamic View Synthesis from Dynamic Monocular Video , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Simon Lucey,et al.  Neural Trajectory Fields for Dynamic Novel View Synthesis , 2021, ArXiv.

[16]  Michael J. Black,et al.  SNARF: Differentiable Forward Skinning for Animating Non-Rigid Neural Implicit Shapes , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Jia Deng,et al.  Tangent Space Backpropagation for 3D Transformation Groups , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  M. Zollhöfer,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Jiajun Wu,et al.  Neural Radiance Flow for 4D View Synthesis and Video Processing , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  J. Kopf,et al.  Robust Consistent Video Depth Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Zhengqi Li,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Wei Jiang,et al.  DeRF: Decomposed Radiance Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  D. Ramanan,et al.  ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction , 2021, NeurIPS.

[27]  Jonathan T. Barron,et al.  Deformable Neural Radiance Fields , 2020, ArXiv.

[28]  C. Stoll,et al.  TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video , 2020, ECCV.

[29]  Richard Szeliski,et al.  Consistent video depth estimation , 2020, ACM Trans. Graph..

[30]  Jan Kautz,et al.  Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jia Deng,et al.  RAFT: Recurrent All-Pairs Field Transforms for Optical Flow , 2020, ECCV.

[32]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[33]  Wan-Yen Lo,et al.  Accelerating 3D deep learning with PyTorch3D , 2019, SIGGRAPH Asia 2020 Courses.

[34]  Jia Deng,et al.  DeepV2D: Video to Depth with Differentiable Structure from Motion , 2018, ICLR.

[35]  Simon Lucey,et al.  Web Stereo Video Supervision for Depth Prediction from Dynamic Scenes , 2019, 2019 International Conference on 3D Vision (3DV).

[36]  Xiaoyun Zhang,et al.  Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Michael J. Black,et al.  3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[42]  R. Hamming,et al.  Numerical Methods for Scientists and Engineers , 1962 .