Object-Centric Neural Scene Rendering

We present a method for composing photorealistic scenes from captured images of objects. Our work builds upon neural radiance fields (NeRFs), which implicitly model the volumetric density and directionally-emitted radiance of a scene. While NeRFs synthesize realistic pictures, they only model static scenes and are closely tied to specific imaging conditions. This property makes NeRFs hard to generalize to new scenarios, including new lighting or new arrangements of objects. Instead of learning a scene radiance field as a NeRF does, we propose to learn object-centric neural scattering functions (OSFs), a representation that models per-object light transport implicitly using a lighting- and view-dependent neural network. This enables rendering scenes even when objects or lights move, without retraining. Combined with a volumetric path tracing procedure, our framework is capable of rendering both intra- and inter-object light transport effects including occlusions, specularities, shadows, and indirect illumination. We evaluate our approach on scene composition and show that it generalizes to novel illumination conditions, producing photorealistic, physically accurate renderings of multi-object scenes.

[1]  Philippe Bekaert,et al.  Advanced global illumination , 2006 .

[2]  Yannick Hold-Geoffroy,et al.  Neural Reflectance Fields for Appearance Acquisition , 2020, ArXiv.

[3]  Kalyan Sunkavalli,et al.  Deep image-based relighting from optimal sparse samples , 2018, ACM Trans. Graph..

[4]  Noah Snavely,et al.  Neural Rerendering in the Wild , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Jan Kautz,et al.  Precomputed radiance transfer for real-time rendering in dynamic, low-frequency lighting environments , 2002 .

[6]  Jonathan T. Barron,et al.  Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[9]  Yong-Liang Yang,et al.  RenderNet: A deep convolutional network for differentiable rendering from 3D shapes , 2018, NeurIPS.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Yannick Hold-Geoffroy,et al.  Deep Reflectance Volumes: Relightable Reconstructions from Multi-View Photometric Images , 2020, ECCV.

[12]  Gordon Wetzstein,et al.  Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations , 2019, NeurIPS.

[13]  Yoshua Bengio,et al.  On the Spectral Bias of Neural Networks , 2018, ICML.

[14]  Yong-Liang Yang,et al.  BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images , 2020, NeurIPS.

[15]  George Drettakis,et al.  Multi-view relighting using a geometry-aware network , 2019, ACM Trans. Graph..

[16]  Jan-Michael Frahm,et al.  Deep blending for free-viewpoint image-based rendering , 2018, ACM Trans. Graph..

[17]  Andreas Geiger,et al.  Learning Implicit Surface Light Fields , 2020, 2020 International Conference on 3D Vision (3DV).

[18]  Victor Lempitsky,et al.  Neural Point-Based Graphics , 2019, ECCV.

[19]  Tat-Seng Chua,et al.  Neural Sparse Voxel Fields , 2020, NeurIPS.

[20]  Jonathan T. Barron,et al.  NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis , 2020, ECCV.

[21]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[23]  Yong-Liang Yang,et al.  HoloGAN: Unsupervised Learning of 3D Representations From Natural Images , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[24]  Nelson L. Max,et al.  Optical Models for Direct Volume Rendering , 1995, IEEE Trans. Vis. Comput. Graph..

[25]  Joe Michael Kniss,et al.  A Model for Volume Lighting and Modeling , 2003, IEEE Trans. Vis. Comput. Graph..

[26]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[27]  David W. Jacobs,et al.  Deep Single-Image Portrait Relighting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Theodore Lim,et al.  Generative and Discriminative Voxel Modeling with Convolutional Neural Networks , 2016, ArXiv.

[29]  James T. Kajiya,et al.  The rendering equation , 1986, SIGGRAPH.

[30]  Gavin S. P. Miller,et al.  Efficient algorithms for local and global accessibility shading , 1994, SIGGRAPH.

[31]  Jitendra Malik,et al.  Habitat: A Platform for Embodied AI Research , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Vittorio Ferrari,et al.  Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Silvio Savarese,et al.  Interactive Gibson Benchmark: A Benchmark for Interactive Navigation in Cluttered Environments , 2019, IEEE Robotics and Automation Letters.

[34]  Andrew W. Fitzgibbon,et al.  Bundle Adjustment - A Modern Synthesis , 1999, Workshop on Vision Algorithms.

[35]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Yun-Ta Tsai,et al.  Single image portrait relighting , 2019, ACM Trans. Graph..

[37]  Marc Levoy,et al.  Efficient ray tracing of volume data , 1990, TOGS.

[38]  Johannes Hanika,et al.  Monte Carlo Methods for Volumetric Light Transport Simulation , 2018, Comput. Graph. Forum.

[39]  Steven M. Seitz,et al.  LookinGood , 2018, ACM Trans. Graph..

[40]  Shree K. Nayar,et al.  Light field transfer: global illumination between real and synthetic objects , 2008, ACM Trans. Graph..

[41]  Justus Thies,et al.  Deferred Neural Rendering: Image Synthesis using Neural Textures , 2019 .

[42]  Bruno Arnaldi,et al.  On the Division of Environments by Virtual Walls for Radiosity Computation , 1994 .

[43]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[44]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[45]  Yun-Ta Tsai,et al.  Neural Light Transport for Relighting and View Synthesis , 2020, ACM Transactions on Graphics.