论文信息 - Deformable Neural Radiance Fields

Deformable Neural Radiance Fields

We present the first method capable of photorealistically reconstructing a non-rigidly deforming scene using photos/videos captured casually from mobile phones. Our approach -- D-NeRF -- augments neural radiance fields (NeRF) by optimizing an additional continuous volumetric deformation field that warps each observed point into a canonical 5D NeRF. We observe that these NeRF-like deformation fields are prone to local minima, and propose a coarse-to-fine optimization method for coordinate-based models that allows for more robust optimization. By adapting principles from geometry processing and physical simulation to NeRF-like models, we propose an elastic regularization of the deformation field that further improves robustness. We show that D-NeRF can turn casually captured selfie photos/videos into deformable NeRF models that allow for photorealistic renderings of the subject from arbitrary viewpoints, which we dub "nerfies." We evaluate our method by collecting data using a rig with two mobile phones that take time-synchronized photos, yielding train/validation images of the same pose at different viewpoints. We show that our method faithfully reconstructs non-rigidly deforming scenes and reproduces unseen views with high fidelity.

[1] M. Pauly,et al. Embedded deformation for shape manipulation , 2007, SIGGRAPH 2007.

[2] Noah Snavely,et al. Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Henning Biermann,et al. Recovering non-rigid 3D shape from image streams , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4] Andrew W. Fitzgibbon,et al. What Shape Are Dolphins? Building 3D Morphable Models from 2D Images , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Patrick Pérez,et al. State of the Art on Monocular 3D Face Reconstruction, Tracking, and Applications , 2018, Comput. Graph. Forum.

[6] Yannick Hold-Geoffroy,et al. Neural Reflectance Fields for Appearance Acquisition , 2020, ArXiv.

[7] Gordon Wetzstein,et al. DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8] David Lopez-Paz,et al. Optimizing the Latent Space of Generative Networks , 2017, ICML.

[9] Rahul Garg,et al. Wireless Software Synchronization of Multiple Distributed Cameras , 2018, 2019 IEEE International Conference on Computational Photography (ICCP).

[10] Patrick Pérez,et al. Deep video portraits , 2018, ACM Trans. Graph..

[11] Stefanos Zafeiriou,et al. Large Scale 3D Morphable Models , 2017, International Journal of Computer Vision.

[12] Cristian Sminchisescu,et al. GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Yangang Wang,et al. Online modeling for realtime facial animation , 2013, ACM Trans. Graph..

[14] Richard A. Newcombe,et al. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Mark Pauly,et al. Projective dynamics , 2014, ACM Trans. Graph..

[16] Gordon Wetzstein,et al. Implicit Neural Representations with Periodic Activation Functions , 2020, NeurIPS.

[17] Arthur Jacot,et al. Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[18] Marc Alexa,et al. As-rigid-as-possible surface modeling , 2007, Symposium on Geometry Processing.

[19] Adam Finkelstein,et al. Text-based editing of talking-head video , 2019, ACM Trans. Graph..

[20] Leonidas J. Guibas,et al. ShapeFlow: Learnable Deformation Flows Among 3D Shapes , 2020, NeurIPS.

[21] Alvaro Collet,et al. High-quality streamable free-viewpoint video , 2015, ACM Trans. Graph..

[22] Jan-Michael Frahm,et al. A Vote-and-Verify Strategy for Fast Spatial Verification in Image Retrieval , 2016, ACCV.

[23] Peter Schröder,et al. A simple geometric model for elastic deformations , 2010, ACM Trans. Graph..

[24] Fan Zhang,et al. MediaPipe: A Framework for Building Perception Pipelines , 2019, ArXiv.

[25] Jonathan T. Barron,et al. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Pratul P. Srinivasan,et al. NeRF , 2020, ECCV.

[27] Aaron Hertzmann,et al. Nonrigid Structure-from-Motion: Estimating Shape and Motion with Hierarchical Priors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Pushmeet Kohli,et al. Fusion4D , 2016, ACM Trans. Graph..

[29] Justus Thies,et al. Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[30] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31] Gordon Wetzstein,et al. State of the Art on Neural Rendering , 2020, Comput. Graph. Forum.

[32] Kai Zhang,et al. NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[33] Yong-Liang Yang,et al. HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[34] Andreas Geiger,et al. Occupancy Flow: 4D Reconstruction by Learning Particle Dynamics , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35] Michael J. Black,et al. 3D Menagerie: Modeling the 3D Shape and Pose of Animals , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Andrew W. Fitzgibbon,et al. Real-time non-rigid reconstruction using an RGB-D camera , 2014, ACM Trans. Graph..

[37] George Papandreou,et al. Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[38] Jitendra Malik,et al. Learning a Multi-View Stereo Machine , 2017, NIPS.

[39] Dieter Fox,et al. DART: dense articulated real-time tracking with consumer depth cameras , 2015, Auton. Robots.

[40] Sebastian Nowozin,et al. Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41] Victor Lempitsky,et al. Few-Shot Adversarial Learning of Realistic Neural Talking Head Models , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[42] Jesús Chamorro-Martínez,et al. Diatom autofocusing in brightfield microscopy: a comparative study , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[43] Hao Zhang,et al. Learning Implicit Fields for Generative Shape Modeling , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44] Pieter Peers,et al. Temporally coherent completion of dynamic shapes , 2012, TOGS.

[45] Dieter Fox,et al. DynamicFusion: Reconstruction and tracking of non-rigid scenes in real-time , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Jernej Barbic,et al. FEM simulation of 3D deformable solids: a practitioner's guide to theory, discretization and model reduction , 2012, SIGGRAPH '12.

[47] Alexei A. Efros,et al. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48] Steven M. Seitz,et al. LookinGood , 2018, ACM Trans. Graph..

[49] Jan-Michael Frahm,et al. Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Dieter Fox,et al. LatentFusion: End-to-End Differentiable Reconstruction and Rendering for Unseen Object Pose Estimation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51] Zhou Wang,et al. Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[52] Andreas Geiger,et al. GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[53] Matthew Turk,et al. A Morphable Model For The Synthesis Of 3D Faces , 1999, SIGGRAPH.

[54] Gordon Wetzstein,et al. Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[55] Kyaw Zaw Lin,et al. Neural Sparse Voxel Fields , 2020, NeurIPS.

[56] R. Gregory. The intelligent eye , 1970 .

[57] Jonathan T. Barron,et al. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[58] Jonathan T. Barron,et al. A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).