NeRF

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location $(x,y,z)$ and viewing direction $(\theta, \phi)$) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons.

[1]  James T. Kajiya,et al.  Ray tracing volume densities , 1984, SIGGRAPH.

[2]  Tom Duff,et al.  Compositing digital images , 1984, SIGGRAPH.

[3]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[4]  Marc Levoy,et al.  Efficient ray tracing of volume data , 1990, TOGS.

[5]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[6]  Marc Levoy,et al.  A volumetric method for building complex models from range images , 1996, SIGGRAPH.

[7]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[8]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[9]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[10]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[11]  David Salesin,et al.  Surface light fields for 3D photography , 2000, SIGGRAPH.

[12]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[13]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[14]  Kiriakos N. Kutulakos,et al.  A Theory of Shape by Space Carving , 2000, International Journal of Computer Vision.

[15]  Kenneth O. Stanley,et al.  Compositional Pattern Producing Networks : A Novel Abstraction of Development , 2007 .

[16]  Frédo Durand,et al.  Unstructured Light Fields , 2012, Comput. Graph. Forum.

[17]  Stephen Lin,et al.  Global illumination with radiance regression functions , 2013, ACM Trans. Graph..

[18]  Ming C. Lin,et al.  Example-guided physically based modal sound synthesis , 2013, ACM Trans. Graph..

[19]  Michael J. Black,et al.  OpenDR: An Approximate Differentiable Renderer , 2014, ECCV.

[20]  Michael Goesele,et al.  Let There Be Color! Large-Scale Texturing of 3D Reconstructions , 2014, ECCV.

[21]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[25]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[26]  Jitendra Malik,et al.  Learning a Multi-View Stereo Machine , 2017, NIPS.

[27]  Alexei A. Efros,et al.  Multi-view Supervision for Single-View Reconstruction via Differentiable Ray Consistency , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[29]  William T. Freeman,et al.  Unsupervised Training for 3D Morphable Model Regression , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[31]  John Flynn,et al.  Stereo magnification , 2018, ACM Trans. Graph..

[32]  Timo Ropinski,et al.  Single‐image Tomography: 3D Volumes from 2D Cranial X‐Rays , 2017, Comput. Graph. Forum.

[33]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Merlin Nimier-David,et al.  Mitsuba 2 , 2019, ACM Trans. Graph..

[36]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[37]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yaser Sheikh,et al.  Neural volumes , 2019, ACM Trans. Graph..

[39]  Learning to Predict 3D Objects with an Interpolation-based Differentiable Renderer , 2019, NeurIPS.

[40]  Jonathan T. Barron,et al.  Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  G. Rainer,et al.  Neural BTF Compression and Interpolation , 2019, Comput. Graph. Forum.

[42]  Yoshua Bengio,et al.  On the Spectral Bias of Neural Networks , 2018, ICML.

[43]  Ravi Ramamoorthi,et al.  Local light field fusion , 2019, ACM Trans. Graph..

[44]  Hao Li,et al.  Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[46]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Andreas Geiger,et al.  Texture Fields: Learning Texture Representations in Function Space , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[49]  Tristan Bepler,et al.  Reconstructing continuous distributions of 3D protein structure from cryo-EM images , 2019, ICLR.

[50]  Thomas Funkhouser,et al.  Local Implicit Grid Representations for 3D Scenes , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[52]  Tim Weyrich,et al.  Unified Neural Encoding of BTFs , 2020, Comput. Graph. Forum.

[53]  Thomas Funkhouser,et al.  Local Deep Implicit Functions for 3D Shape , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Learning a Neural 3D Texture Space From 2D Exemplars , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Pratul P. Srinivasan,et al.  NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , 2020, European Conference on Computer Vision.