Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction

We present a super-fast convergence approach to reconstructing the per-scene radiance field from a set of images that capture the scene with known poses. This task, which is often applied to novel view synthesis, is recently revolutionized by Neural Radiance Field (NeRF) for its state-of-the-art quality and flexibility. However, NeRF and its variants require a lengthy training time ranging from hours to days for a single scene. In contrast, our approach achieves NeRFcomparable quality and converges rapidly from scratch in less than 15 minutes with a single GPU. We adopt a representation consisting of a density voxel grid for scene geometry and a feature voxel grid with a shallow network for complex view-dependent appearance. Modeling with explicit and discretized volume representations is not new, but we propose two simple yet non-trivial techniques that contribute to fast convergence speed and high-quality output. First, we introduce the post-activation interpolation on voxel density, which is capable of producing sharp surfaces in lower grid resolution. Second, direct voxel density optimization is prone to suboptimal geometry solutions, so we robustify the optimization process by imposing several priors. Finally, evaluation on five inward-facing benchmarks shows that our method matches, if not surpasses, NeRF’s quality, yet it only takes about 15 minutes to train from scratch for a new scene. Code: https://github.com/sunset1995/DirectVoxGO.

[1]  Nassir Navab,et al.  Peeking Behind Objects: Layered Depth Prediction from a Single Image , 2018, Pattern Recognit. Lett..

[2]  Hao Su,et al.  MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Johannes Kopf,et al.  Dynamic View Synthesis from Dynamic Monocular Video , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[4]  Kai Zhang,et al.  NeRF++: Analyzing and Improving Neural Radiance Fields , 2020, ArXiv.

[5]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[6]  Frédo Durand,et al.  Linear view synthesis using a dimensionality gap light field prior , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[8]  Justus Thies,et al.  Dynamic Neural Radiance Fields for Monocular 4D Facial Avatar Reconstruction , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Frédo Durand,et al.  Light Field Reconstruction Using Sparsity in the Continuous Fourier Domain , 2014, ACM Trans. Graph..

[11]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[12]  Stefano Soatto,et al.  DeepVoxels++: Enhancing the Fidelity of Novel View Synthesis from 3D Voxel Embeddings , 2020, ACCV.

[13]  Yiyi Liao,et al.  KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Andreas Geiger,et al.  GRAF: Generative Radiance Fields for 3D-Aware Image Synthesis , 2020, NeurIPS.

[15]  Wei Jiang,et al.  DeRF: Decomposed Radiance Fields , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[17]  Jonathan T. Barron,et al.  Learned Initializations for Optimizing Coordinate-Based Neural Representations , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Paul Debevec,et al.  DeepView: View Synthesis With Learned Gradient Descent , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Matthew Tancik,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  David Salesin,et al.  Surface light fields for 3D photography , 2000, SIGGRAPH.

[21]  Jia-Bin Huang,et al.  3D Photography Using Context-Aware Layered Depth Inpainting , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiaowei Zhou,et al.  Neural Rays for Occlusion-aware Image-based Rendering , 2021 .

[23]  Deva Ramanan,et al.  Depth-supervised NeRF: Fewer Views and Faster Training for Free , 2021, ArXiv.

[24]  Noah Snavely,et al.  Layer-structured 3D Scene Inference via View Synthesis , 2018, ECCV.

[25]  Jonathan T. Barron,et al.  NeRD: Neural Reflectance Decomposition from Image Collections , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[26]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[27]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[28]  Zhengqi Li,et al.  Crowdsampling the Plenoptic Function , 2020, ECCV.

[29]  Hao Su,et al.  GNeRF: GAN-based Neural Radiance Field without Posed Camera , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Richard Szeliski,et al.  Layered depth images , 1998, SIGGRAPH.

[31]  Jonathan T. Barron,et al.  Baking Neural Radiance Fields for Real-Time View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[33]  Long Quan,et al.  BlendedMVS: A Large-Scale Dataset for Generalized Multi-View Stereo Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Jonathan T. Barron,et al.  Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains , 2020, NeurIPS.

[35]  Changil Kim,et al.  Space-time Neural Irradiance Fields for Free-Viewpoint Video , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yannick Hold-Geoffroy,et al.  Neural Reflectance Fields for Appearance Acquisition , 2020, ArXiv.

[37]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Stephen Lin,et al.  Neural Articulated Radiance Field , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Supasorn Suwajanakorn,et al.  NeX: Real-time View Synthesis with Neural Basis Expansion , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[41]  Paul Debevec,et al.  NeRFactor , 2021, ACM Trans. Graph..

[42]  Jonathan T. Barron,et al.  Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Jonathan T. Barron,et al.  Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields , 2021, ArXiv.

[44]  Marek Kowalski,et al.  FastNeRF: High-Fidelity Neural Rendering at 200FPS , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Christian Theobalt,et al.  Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synthesis of a Dynamic Scene From Monocular Video , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[46]  Jonathan T. Barron,et al.  iNeRF: Inverting Neural Radiance Fields for Pose Estimation , 2020, 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[47]  Jonathan T. Barron,et al.  NeRV: Neural Reflectance and Visibility Fields for Relighting and View Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[49]  Frédo Durand,et al.  Unstructured Light Fields , 2012, Comput. Graph. Forum.

[50]  Michael Goesele,et al.  Let There Be Color! Large-Scale Texturing of 3D Reconstructions , 2014, ECCV.

[51]  Jonathan T. Barron,et al.  NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Noah Snavely,et al.  Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[54]  Francesc Moreno-Noguer,et al.  D-NeRF: Neural Radiance Fields for Dynamic Scenes , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Noah Snavely,et al.  Single-View View Synthesis With Multiplane Images , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Gordon Wetzstein,et al.  pi-GAN: Periodic Implicit Generative Adversarial Networks for 3D-Aware Image Synthesis , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Gordon Wetzstein,et al.  AutoInt: Automatic Integration for Fast Neural Volume Rendering , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Victor Adrian Prisacariu,et al.  NeRF-: Neural Radiance Fields Without Known Camera Parameters , 2021, ArXiv.

[59]  Adam R. Kosiorek,et al.  NeRF-VAE: A Geometry Aware 3D Scene Generative Model , 2021, ICML.

[60]  Jonathan T. Barron,et al.  Deformable Neural Radiance Fields , 2020, ArXiv.

[61]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[62]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Jonathan T. Barron,et al.  IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Andreas Geiger,et al.  UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Minsu Cho,et al.  Self-Calibrating Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[66]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[67]  Antonio Torralba,et al.  BARF: Bundle-Adjusting Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).