论文信息 - Neural Radiance Flow for 4D View Synthesis and Video Processing

Neural Radiance Flow for 4D View Synthesis and Video Processing

We present a method, Neural Radiance Flow (NeRFlow), to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images. Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene. By enforcing consistency across different modalities, our representation enables multi-view rendering in diverse dynamic scenes, including water pouring, robotic interaction, and real images, outperforming state-of-the-art methods for spatial-temporal view synthesis. Our approach works even when being provided only a single monocular real video. We further demonstrate that the learned representation can serve as an implicit scene prior, enabling video processing tasks such as image super-resolution and de-noising without any additional supervision.

[1] Michael Bosse,et al. Unstructured lumigraph rendering , 2001, SIGGRAPH.

[2] Pat Hanrahan,et al. A signal-processing framework for inverse rendering , 2001, SIGGRAPH.

[3] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[4] Richard Szeliski,et al. High-quality video view interpolation using a layered representation , 2004, SIGGRAPH 2004.

[5] Eero P. Simoncelli,et al. Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[6] Paul E. Debevec,et al. Acquisition of time-varying participating media , 2005, ACM Trans. Graph..

[7] Jean-Michel Morel,et al. A non-local algorithm for image denoising , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8] Hans-Peter Seidel,et al. Time-resolved 3d capture of non-stationary gas flows , 2008, SIGGRAPH 2008.

[9] Marcus A. Magnor,et al. Virtual video camera: image-based viewpoint navigation through space and time , 2009, SIGGRAPH '09.

[10] Wojciech Matusik,et al. Moving gradients: a path-based method for plausible image interpolation , 2009, ACM Trans. Graph..

[11] Joseph L. Mundy,et al. Dynamic Probabilistic Volumetric Models , 2013, 2013 IEEE International Conference on Computer Vision.

[12] Daniel Cremers,et al. Generalized Connectivity Constraints for Spatio-temporal 3D Reconstruction , 2014, ECCV.

[13] Michael J. Black,et al. OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[14] Derek Bradley,et al. Detailed spatio-temporal reconstruction of eyelids , 2015, ACM Trans. Graph..

[15] Sebastian Scherer,et al. VoxNet: A 3D Convolutional Neural Network for real-time object recognition , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[16] Jianxiong Xiao,et al. 3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17] Jean-Yves Guillemaut,et al. General Dynamic Scene Reconstruction from Multiple View Video , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] Connor Schenck,et al. Towards Learning to Perceive and Reason About Liquids , 2016, ISER.

[20] Jitendra Malik,et al. View Synthesis by Appearance Flow , 2016, ECCV.

[21] Jiajun Wu,et al. Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling , 2016, NIPS.

[22] Theodore Lim,et al. Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[23] Ting-Chun Wang,et al. Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[24] Silvio Savarese,et al. 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction , 2016, ECCV.

[25] Jean-Yves Guillemaut,et al. Temporally Coherent 4D Reconstruction of Complex Dynamic Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Yinghao Huang,et al. Towards Accurate Marker-Less Human Shape and Pose Estimation over Time , 2017, 2017 International Conference on 3D Vision (3DV).

[28] Hao Su,et al. A Point Set Generation Network for 3D Object Reconstruction from a Single Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Nassir Navab,et al. Long Short-Term Memory Kalman Filters: Recurrent Neural Estimators for Pose Regularization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30] Gernot Riegler,et al. OctNet: Learning Deep 3D Representations at High Resolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Edmond Boyer,et al. Multi-view Dynamic Shape Refinement Using Local Temporal Integration , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32] Leonidas J. Guibas,et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[33] Minglun Gong,et al. 4D Reconstruction of Blooming Flowers , 2017, Comput. Graph. Forum.

[34] Ersin Yumer,et al. Self-supervised Learning of Motion Capture , 2017, NIPS.

[35] Jitendra Malik,et al. Learning Category-Specific Mesh Reconstruction from Image Collections , 2018, ECCV.

[36] Jan-Michael Frahm,et al. Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[37] Ning Zhang,et al. Multi-view to Novel View: Synthesizing Novel Views With Self-learned Confidence , 2018, ECCV.

[38] Wei Liu,et al. Pixel2Mesh: Generating 3D Mesh Models from Single RGB Images , 2018, ECCV.

[39] David Duvenaud,et al. Neural Ordinary Differential Equations , 2018, NeurIPS.

[40] Tatsuya Harada,et al. Neural 3D Mesh Renderer , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41] Mathieu Aubry,et al. A Papier-Mache Approach to Learning 3D Surface Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42] Alexandros G. Dimakis,et al. Compressed Sensing with Deep Image Prior and Learned Regularization , 2018, ArXiv.

[43] Yaser Sheikh,et al. Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[44] Jan Kautz,et al. Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46] Andrea Vedaldi,et al. Deep Image Prior , 2017, International Journal of Computer Vision.

[47] Paul Debevec,et al. DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Andreas Geiger,et al. Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49] Hao Li,et al. Learning to Infer Implicit Surfaces without 3D Supervision , 2019, NeurIPS.

[50] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Shengping Zhang,et al. Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52] Hao Li,et al. PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[54] Jitendra Malik,et al. Learning 3D Human Dynamics From Video , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Xiaoyun Zhang,et al. Depth-Aware Video Frame Interpolation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56] Ulugbek Kamilov,et al. Image Restoration Using Total Variation Regularized Deep Image Prior , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[57] Hao Li,et al. Soft Rasterizer: Differentiable Rendering for Unsupervised Single-View Mesh Reconstruction , 2019, ArXiv.

[58] Duygu Ceylan,et al. DISN: Deep Implicit Surface Network for High-quality Single-view 3D Reconstruction , 2019, NeurIPS.

[59] Yong-Liang Yang,et al. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[60] Thomas A. Funkhouser,et al. Learning Shape Templates With Structured Implicit Functions , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[61] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[62] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[63] H. Yao,et al. Pix2Vox: Context-Aware 3D Reconstruction From Single and Multi-View Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[65] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66] Andreas Geiger,et al. Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67] Jan Kautz,et al. Novel View Synthesis of Dynamic Scenes With Globally Coherent Depths From a Monocular Camera , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68] Richard Szeliski,et al. SynSin: End-to-End View Synthesis From a Single Image , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Silvio Savarese,et al. Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2019, IEEE Robotics and Automation Letters.

[70] Karol Myszkowski,et al. X-Fields , 2020, ACM Trans. Graph..

[71] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[72] Richard Szeliski,et al. Consistent video depth estimation , 2020, ACM Trans. Graph..

[73] Qifeng Chen,et al. Blind Video Temporal Consistency via Deep Video Prior , 2020, NeurIPS.

[74] Jonathan T. Barron,et al. Deformable Neural Radiance Fields , 2020, ArXiv.

[75] Andreas Geiger,et al. Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76] Yaser Sheikh,et al. 4D Visualization of Dynamic Events From Unconstrained Multi-View Videos , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[77] Matthias Zwicker,et al. SDFDiff: Differentiable Rendering of Signed Distance Fields for 3D Shape Optimization , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78] M. Zollhöfer,et al. Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[79] Zhengqi Li,et al. Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[80] Changil Kim,et al. Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81] Francesc Moreno-Noguer,et al. D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82] Dimitrios Tzionas,et al. Embodied Hands: Modeling and Capturing Hands and Bodies Together , 2022, ArXiv.