Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination

We present a deep learning solution for estimating the incident illumination at any 3D location within a scene from an input narrow-baseline stereo image pair. Previous approaches for predicting global illumination from images either predict just a single illumination for the entire scene, or separately estimate the illumination at each 3D location without enforcing that the predictions are consistent with the same 3D scene. Instead, we propose a deep learning model that estimates a 3D volumetric RGBA model of a scene, including content outside the observed field of view, and then uses standard volume rendering to estimate the incident illumination at any 3D location within that volume. Our model is trained without any ground truth 3D data and only requires a held-out perspective view near the input stereo pair and a spherical panorama taken within each scene as supervision, as opposed to prior methods for spatially-varying lighting estimation, which require ground truth scene geometry for training. We demonstrate that our method can predict consistent spatially-varying lighting that is convincing enough to plausibly relight and insert highly specular virtual objects into real images.

[1]  Yannick Hold-Geoffroy,et al.  Deep Outdoor Illumination Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Richard Szeliski,et al.  Stereo Matching with Transparency and Matting , 1999, International Journal of Computer Vision.

[3]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4]  Kalyan Sunkavalli,et al.  Deep view synthesis from sparse photometric images , 2019, ACM Trans. Graph..

[5]  Andrea Fusiello Image-based Rendering * , 2003 .

[6]  Harry Shum,et al.  Plenoptic sampling , 2000, SIGGRAPH.

[7]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[8]  K. Hohn,et al.  Determining Lightness from an Image , 2004 .

[9]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[10]  Jan Kautz,et al.  Neural Inverse Rendering of an Indoor Scene From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Gabriel Eilertsen,et al.  HDR image reconstruction from a single exposure using deep CNNs , 2017, ACM Trans. Graph..

[12]  Paul E. Debevec,et al.  Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography , 1998, SIGGRAPH '08.

[13]  Wenbin Li,et al.  InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset , 2018, BMVC.

[14]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[15]  Jonathan T. Barron,et al.  Scene Intrinsics and Depth from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[16]  Kalyan Sunkavalli,et al.  Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Thomas Funkhouser,et al.  Neural Illumination: Lighting Prediction for Indoor Environments , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kenny Mitchell,et al.  From Faces to Outdoor Light Probes , 2018, Comput. Graph. Forum.

[19]  Ersin Yumer,et al.  Learning to predict indoor illumination from a single image , 2017, ACM Trans. Graph..

[20]  Michael Bosse,et al.  Unstructured lumigraph rendering , 2001, SIGGRAPH.

[21]  M. Levoy,et al.  Fast volume rendering using a shear-warp factorization of the viewing transformation , 1994, SIGGRAPH.

[22]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Graham Fyffe,et al.  Stereo Magnification: Learning View Synthesis using Multiplane Images , 2018, ArXiv.

[24]  Kalyan Sunkavalli,et al.  Automatic Scene Inference for 3D Object Compositing , 2014, ACM Trans. Graph..

[25]  Wan-Chun Ma,et al.  DeepLight: Learning Illumination for Unconstrained Mobile Mixed Reality , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  LalondeJean-François,et al.  Learning to predict indoor illumination from a single image , 2017 .

[27]  Ravi Ramamoorthi,et al.  Local Light Field Fusion: Practical View Synthesis with Prescriptive Sampling Guidelines , 2019 .

[28]  Jitendra Malik,et al.  Modeling and Rendering Architecture from Photographs: A hybrid geometry- and image-based approach , 1996, SIGGRAPH.

[29]  Richard Szeliski,et al.  Layered depth images , 1998, SIGGRAPH.

[30]  Alexei A. Efros,et al.  Estimating natural illumination from a single outdoor image , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Jonathan T. Barron,et al.  Pushing the Boundaries of View Extrapolation With Multiplane Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Taesung Park,et al.  Semantic Image Synthesis With Spatially-Adaptive Normalization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[34]  Yannick Hold-Geoffroy,et al.  Deep Parametric Indoor Lighting Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Kalyan Sunkavalli,et al.  Fast Spatially-Varying Indoor Lighting Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Marc Levoy,et al.  Display of surfaces from volume data , 1988, IEEE Computer Graphics and Applications.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Pat Hanrahan,et al.  Volume Rendering , 2020, Definitions.

[39]  Gordon Wetzstein,et al.  DeepVoxels: Learning Persistent 3D Feature Embeddings , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  John Flynn,et al.  Deep Stereo: Learning to Predict New Views from the World's Imagery , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).