Baking Neural Radiance Fields for Real-Time View Synthesis

Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the scene from unobserved viewpoints. However, NeRF’s computational requirements are prohibitive for real-time applications: rendering views from a trained NeRF requires querying a multilayer perceptron (MLP) hundreds of times per ray. We present a method to train a NeRF, then precompute and store (i.e. “bake”) it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables realtime rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF’s architecture, and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF’s ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.

[1]  Jonathan T. Barron,et al.  Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Jan Kautz,et al.  Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments , 2002 .

[3]  Pat Hanrahan,et al.  All-frequency shadows using non-linear wavelet lighting approximation , 2003, ACM Trans. Graph..

[4]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Marek Kowalski,et al.  FastNeRF: High-Fidelity Neural Rendering at 200FPS , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[10]  Yannick Hold-Geoffroy,et al.  Neural Reflectance Fields for Appearance Acquisition , 2020, ArXiv.

[12]  Wei Jiang,et al.  DeRF: Decomposed Radiance Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[14]  Jonathan T. Barron,et al.  Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Matthias Nießner,et al.  Real-time 3D reconstruction at scale using voxel hashing , 2013, ACM Trans. Graph..

[16]  Yiyi Liao,et al.  KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Jonathan T. Barron,et al.  Deformable Neural Radiance Fields , 2020, ArXiv.

[18]  Steven M. Seitz,et al.  Photorealistic Scene Reconstruction by Voxel Coloring , 1997, International Journal of Computer Vision.

[19]  M. Levoy,et al.  Fast volume rendering using a shear-warp factorization of the viewing transformation , 1994, SIGGRAPH.

[20]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[21]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[22]  Sylvain Lefebvre,et al.  GigaVoxels: ray-guided streaming for efficient and detailed voxel rendering , 2009, I3D '09.

[23]  Jonathan T. Barron,et al.  NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Lee Westover,et al.  Splatting: a parallel, feed-forward volume rendering algorithm , 1991 .

[25]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[26]  Jonathan T. Barron,et al.  A General and Adaptive Robust Loss Function , 2017, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Noah Snavely,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[29]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[30]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Hugues Hoppe,et al.  Perfect spatial hashing , 2006, ACM Trans. Graph..

[32]  Gordon Wetzstein,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Paul Debevec,et al.  Immersive light field video with a layered mesh representation , 2020, ACM Trans. Graph..

[34]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[35]  Frédo Durand,et al.  Unstructured Light Fields , 2012, Comput. Graph. Forum.

[36]  Michael Goesele,et al.  Let There Be Color! Large-Scale Texturing of 3D Reconstructions , 2014, ECCV.

[37]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[38]  Neil Hunt,et al.  The triangle processor and normal vector shader: a VLSI system for high performance graphics , 1988, SIGGRAPH.

[39]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[40]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[41]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[42]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[43]  Marc Levoy,et al.  Efficient ray tracing of volume data , 1990, TOGS.

[44]  Felix Heide,et al.  Neural Scene Graphs for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  David Salesin,et al.  Surface light fields for 3D photography , 2000, SIGGRAPH.

[46]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[47]  Yu-Ting Tsai,et al.  All-frequency precomputed radiance transfer using spherical radial basis functions and clustered tensor approximation , 2006, SIGGRAPH '06.

[48]  Tomas Akenine-Möller,et al.  Performance per What , 2012 .

[49]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[50]  Jonathan T. Barron,et al.  IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).