Learning to predict indoor illumination from a single image

We propose an automatic method to infer high dynamic range illumination from a single, limited field-of-view, low dynamic range photograph of an indoor scene. In contrast to previous work that relies on specialized image capture, user input, and/or simple scene models, we train an end-to-end deep neural network that directly regresses a limited field-of-view photo to HDR illumination, without strong assumptions on scene geometry, material properties, or lighting. We show that this can be accomplished in a three step process: 1) we train a robust lighting classifier to automatically annotate the location of light sources in a large dataset of LDR environment maps, 2) we use these annotations to train a deep neural network that predicts the location of lights in a scene from a single limited field-of-view photo, and 3) we fine-tune this network using a small dataset of HDR environment maps to predict light intensities. This allows us to automatically recover high-quality HDR illumination estimates that significantly outperform previous state-of-the-art methods. Consequently, using our illumination estimates for applications like 3D object insertion, produces photo-realistic results that we validate via a perceptual user study.

[1]  Yannick Hold-Geoffroy,et al.  Deep Outdoor Illumination Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Luc Van Gool,et al.  DeLight-Net: Decomposing Reflectance Maps into Specular Materials and Natural Illumination , 2016, ArXiv.

[3]  Roberto Scopigno,et al.  EnvyDepth: An Interface for Recovering Local Natural Illumination from Environment Maps , 2013, Comput. Graph. Forum.

[4]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Erik Reinhard,et al.  High Dynamic Range Imaging: Acquisition, Display, and Image-Based Lighting , 2010 .

[7]  Erik Reinhard,et al.  Image-based material editing , 2005, SIGGRAPH '05.

[8]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[9]  Mario Fritz,et al.  Deep Reflectance Maps , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexander Wilkie,et al.  An analytic model for full spectral sky-dome radiance , 2012, ACM Trans. Graph..

[11]  Hans-Peter Seidel,et al.  Lightweight binocular facial performance capture under uncontrolled lighting , 2012, ACM Trans. Graph..

[12]  Shree K. Nayar,et al.  Face swapping: automatically replacing faces in photographs , 2008, SIGGRAPH 2008.

[13]  Krista A. Ehinger,et al.  Recognizing scene viewpoint using panoramic place representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[15]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Michael F. Cohen,et al.  Emptying, refurnishing, and relighting indoor spaces , 2016, ACM Trans. Graph..

[18]  Pat Hanrahan,et al.  A signal-processing framework for inverse rendering , 2001, SIGGRAPH.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Abhinav Gupta,et al.  Marr Revisited: 2D-3D Alignment via Surface Normal Prediction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Jitendra Malik,et al.  Recovering high dynamic range radiance maps from photographs , 1997, SIGGRAPH '08.

[23]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[24]  Yasuyuki Matsushita,et al.  High-quality shape from multi-view stereo and shading under general illumination , 2011, CVPR 2011.

[25]  Alexei A. Efros,et al.  Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Erik Reinhard,et al.  Compositing images through light source detection , 2010, Comput. Graph..

[27]  Alexei A. Efros,et al.  What Do the Sun and the Sky Tell Us About the Camera? , 2010, International Journal of Computer Vision.

[28]  Paul Graham,et al.  A single-shot light probe , 2012, SIGGRAPH '12.

[29]  Paul E. Debevec,et al.  Rendering synthetic objects into real scenes: bridging traditional and image-based graphics with global illumination and high dynamic range photography , 1998, SIGGRAPH '08.

[30]  Ko Nishino,et al.  Reflectance and Illumination Recovery in the Wild , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Kalyan Sunkavalli,et al.  Automatic Scene Inference for 3D Object Compositing , 2014, ACM Trans. Graph..

[33]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.