Efficient Neural Radiance Fields with Learned Depth-Guided Sampling

This paper aims to reduce the rendering time of generalizable radiance fields. Some recent works equip neural radiance fields with image encoders and are able to generalize across scenes, which avoids the per-scene optimization. However, their rendering process is generally very slow. A major factor is that they sample lots of points in empty space when inferring radiance fields. In this paper, we present a hybrid scene representation which combines the best of implicit radiance fields and explicit depth maps for efficient rendering. Specifically, we first build the cascade cost volume to efficiently predict the coarse geometry of the scene. The coarse geometry allows us to sample few points near the scene surface and significantly improves the rendering speed. This process is fully differentiable, enabling us to jointly learn the depth prediction and radiance field networks from only RGB images. Experiments show that the proposed approach exhibits state-of-the-art performance on the DTU, Real Forward-facing and NeRF Synthetic datasets, while being at least 50 times faster than previous generalizable radiance field methods. We also demonstrate the capability of our method to synthesize free-viewpoint videos of dynamic human performers in real-time. The code will be available at https://zju3dv.github.io/enerf/.

[1]  Jan Kautz,et al.  Extreme View Synthesis , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Hao Su,et al.  MVSNeRF: Fast Generalizable Radiance Field Reconstruction from Multi-View Stereo , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Kyaw Zaw Lin,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[4]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[5]  Richard Szeliski,et al.  The lumigraph , 1996, SIGGRAPH.

[6]  Gerard Pons-Moll,et al.  Stereo Radiance Fields (SRF): Learning View Synthesis for Sparse Views of Novel Scenes , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Long Quan,et al.  Recurrent MVSNet for High-Resolution Multi-View Stereo Depth Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sebastian Nowozin,et al.  Occupancy Networks: Learning 3D Reconstruction in Function Space , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[10]  Christian Theobalt,et al.  Neural Rendering and Reenactment of Human Actor Videos , 2018, ACM Trans. Graph..

[11]  Erran L. Li,et al.  Deep Stereo Using Adaptive Thin Volume Representation With Uncertainty Awareness , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Zhengqi Li,et al.  Crowdsampling the Plenoptic Function , 2020, ECCV.

[13]  Alex Trevithick,et al.  GRF: Learning a General Radiance Field for 3D Representation and Rendering , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[15]  C. Theobalt,et al.  Neural Rays for Occlusion-aware Image-based Rendering , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Pratul P. Srinivasan,et al.  NeRF , 2020, ECCV.

[17]  Angjoo Kanazawa,et al.  pixelNeRF: Neural Radiance Fields from One or Few Images , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Li Zhang,et al.  Soft 3D reconstruction for view synthesis , 2017, ACM Trans. Graph..

[19]  Gernot Riegler,et al.  Free View Synthesis , 2020, ECCV.

[20]  George Drettakis,et al.  Depth synthesis and local warps for plausible image-based navigation , 2013, TOGS.

[21]  Ren Ng,et al.  PlenOctrees for Real-time Rendering of Neural Radiance Fields , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Jing Xu,et al.  Point-Based Multi-View Stereo Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Zehao Yu,et al.  Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[25]  Engin Tola,et al.  Large Scale Multiview Stereopsis Evaluation , 2014 .

[26]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[27]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Pratul P. Srinivasan,et al.  IBRNet: Learning Multi-View Image-Based Rendering , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  M. Zollhöfer,et al.  Learning Dynamic Textures for Neural Rendering of Human Actors , 2020, IEEE Transactions on Visualization and Computer Graphics.

[30]  Long Quan,et al.  MVSNet: Depth Inference for Unstructured Multi-view Stereo , 2018, ECCV.

[31]  Ting-Chun Wang,et al.  Learning-based view synthesis for light field cameras , 2016, ACM Trans. Graph..

[32]  Jonathan T. Barron,et al.  Baking Neural Radiance Fields for Real-Time View Synthesis , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Christian Theobalt,et al.  Real-time deep dynamic characters , 2021, ACM Transactions on Graphics.

[34]  Frédo Durand,et al.  Unstructured Light Fields , 2012, Comput. Graph. Forum.

[35]  Tao Guan,et al.  P-MVSNet: Learning Patch-Wise Matching Confidence Aggregation for Multi-View Stereo , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[37]  Yinda Zhang,et al.  DIST: Rendering Deep Implicit Signed Distance Function With Differentiable Sphere Tracing , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Hujun Bao,et al.  Neural Body: Implicit Neural Representations with Structured Latent Codes for Novel View Synthesis of Dynamic Humans , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Stephan J. Garbin,et al.  FastNeRF: High-Fidelity Neural Rendering at 200FPS , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[41]  Marc Levoy,et al.  Light field rendering , 1996, SIGGRAPH.

[42]  Yiyi Liao,et al.  KiloNeRF: Speeding up Neural Radiance Fields with Thousands of Tiny MLPs , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Siyu Zhu,et al.  Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[46]  Jan-Michael Frahm,et al.  Pixelwise View Selection for Unstructured Multi-View Stereo , 2016, ECCV.

[47]  Richard A. Newcombe,et al.  DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Andreas Geiger,et al.  Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.