论文信息 - Space-time Neural Irradiance Fields for Free-Viewpoint Video

Space-time Neural Irradiance Fields for Free-Viewpoint Video

We present a method that learns a spatiotemporal neural irradiance field for dynamic scenes from a single video. Our learned representation enables free-viewpoint rendering of the input video. Our method builds upon recent advances in implicit representations. Learning a spatiotemporal irradiance field from a single video poses significant challenges because the video contains only one observation of the scene at any point in time. The 3D geometry of a scene can be legitimately represented in numerous ways since varying geometry (motion) can be explained with varying appearance and vice versa. We address this ambiguity by constraining the time-varying geometry of our dynamic scene representation using the scene depth estimated from video depth estimation methods, aggregating contents from individual frames into a single global representation. We provide an extensive quantitative evaluation and demonstrate compelling free-viewpoint rendering results.

[1] Jan Kautz,et al. Extreme View Synthesis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2] Marc Levoy,et al. A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[3] M. Zollhöfer,et al. Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4] J. Kopf,et al. Robust Consistent Video Depth Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Paul Debevec,et al. DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] M. Pollefeys,et al. Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, ACM Trans. Graph..

[7] Narendra Ahuja,et al. DeepMVS: Learning Multi-view Stereopsis , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8] Zhengqi Li,et al. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9] Jan Kautz,et al. Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Kyaw Zaw Lin,et al. Neural Sparse Voxel Fields , 2020, NeurIPS.

[11] Andreas Geiger,et al. Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12] Ravi Ramamoorthi,et al. Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[13] James Tompkin,et al. MatryODShka: Real-time 6DoF Video View Synthesis using Multi-Sphere Images , 2020, ECCV.

[14] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[16] R. Szeliski,et al. SynSin: End-to-End View Synthesis From a Single Image , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Noah Snavely,et al. Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19] Zhengqi Li,et al. Crowdsampling the Plenoptic Function , 2020, ECCV.

[20] Hans-Peter Seidel,et al. Efficient Multi‐image Correspondences for On‐line Light Field Video Processing , 2016, Comput. Graph. Forum.

[21] Hanbyul Joo,et al. PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[23] Richard Szeliski,et al. Consistent video depth estimation , 2020, ACM Trans. Graph..

[24] Hsin-Ying Lee,et al. Semantic View Synthesis , 2020, ECCV.

[25] Li Zhang,et al. Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[26] Matthias Nießner,et al. State of the Art on 3D Reconstruction with RGB‐D Cameras , 2018, Comput. Graph. Forum.

[27] Gernot Riegler,et al. Free View Synthesis , 2020, ECCV.

[28] Jonathan T. Barron,et al. Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Jan-Michael Frahm,et al. One shot 3D photography , 2020, ACM Trans. Graph..

[30] Ruigang Yang,et al. Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Christian Theobalt,et al. Occlusion-Aware Depth Estimation with Adaptive Normal Constraints , 2020, ECCV.

[32] Wenping Wang,et al. Neural Animation and Reenactment of Human Actor Videos , 2018, ArXiv.

[33] Kai Zhang,et al. NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[34] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[35] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Francesc Moreno-Noguer,et al. D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Johannes Kopf,et al. Dynamic View Synthesis from Dynamic Monocular Video , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38] Jia-Bin Huang,et al. 3D Photography Using Context-Aware Layered Depth Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Graham Fyffe,et al. Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[40] Michael J. Black,et al. A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[41] M. Zollhöfer,et al. Learning Dynamic Textures for Neural Rendering of Human Actors , 2020, IEEE Transactions on Visualization and Computer Graphics.

[42] Matthias Nießner,et al. VolumeDeform: Real-Time Volumetric Non-rigid Reconstruction , 2016, ECCV.

[43] Long Quan,et al. MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[44] Hans-Peter Seidel,et al. Free-viewpoint video of human actors , 2003, ACM Trans. Graph..

[45] Ting-Chun Wang,et al. Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[46] Paul Debevec,et al. Immersive light field video with a layered mesh representation , 2020, ACM Trans. Graph..

[47] Jonathan T. Barron,et al. Deformable Neural Radiance Fields , 2020, ArXiv.

[48] M. Zollhöfer,et al. DeepDeform: Learning Non-Rigid RGB-D Reconstruction With Semi-Supervised Data , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Andrew W. Fitzgibbon,et al. Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[50] Y. Lipman,et al. SAL: Sign Agnostic Learning of Shapes From Raw Data , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, ACM Trans. Graph..

[52] Carlos Hernandez,et al. Multi-View Stereo: A Tutorial , 2015, Found. Trends Comput. Graph. Vis..

[53] Ariel Shamir,et al. A Survey on Data‐Driven Video Completion , 2015, Comput. Graph. Forum.

[54] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[55] Victor Lempitsky,et al. Textured Neural Avatars , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Chao Liu,et al. Neural RGB®D Sensing: Depth and Uncertainty From a Video Camera , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57] Pushmeet Kohli,et al. Fusion4D , 2016, ACM Trans. Graph..

[58] Siyu Zhu,et al. Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Matthias Nießner,et al. Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[60] Feng Liu,et al. 3D Ken Burns effect from a single image , 2019, ACM Trans. Graph..

[61] Narendra Ahuja,et al. Temporally coherent completion of dynamic video , 2016, ACM Trans. Graph..

[62] Ruigang Yang,et al. Real-Time Simultaneous Pose and Shape Estimation for Articulated Objects Using a Single Depth Camera , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Justus Thies,et al. Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[64] Anders P. Eriksson,et al. Implicit Surface Representations As Layers in Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65] Alvaro Collet,et al. High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[66] Jiajun Wu,et al. Neural Radiance Flow for 4D View Synthesis and Video Processing , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[67] Charles T. Loop,et al. Holoportation: Virtual 3D Teleportation in Real-time , 2016, UIST.

[68] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Jia Deng,et al. DeepV2D: Video to Depth with Differentiable Structure from Motion , 2018, ICLR.

[70] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71] Andrew W. Fitzgibbon,et al. KinectFusion: Real-time dense surface mapping and tracking , 2011, 2011 10th IEEE International Symposium on Mixed and Augmented Reality.

[72] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[73] Koray Kavukcuoglu,et al. Neural scene representation and rendering , 2018, Science.

[74] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75] Alexei A. Efros,et al. Everybody Dance Now , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[76] Gordon Wetzstein,et al. State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[77] Christian Theobalt,et al. LiveCap , 2018, ACM Trans. Graph..

[78] Andreas Geiger,et al. Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[79] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..

[80] Yaser Sheikh,et al. 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81] Bolei Zhou,et al. Deep Flow-Guided Video Inpainting , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82] Andrew W. Fitzgibbon,et al. KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera , 2011, UIST.

[83] Chen Gao,et al. Flow-edge Guided Video Completion , 2020, ECCV.