Inverse Rendering for Complex Indoor Scenes: Shape, Spatially-Varying Lighting and SVBRDF From a Single Image

We propose a deep inverse rendering framework for indoor scenes. From a single RGB image of an arbitrary indoor scene, we obtain a complete scene reconstruction, estimating shape, spatially-varying lighting, and spatially-varying, non-Lambertian surface reflectance. Our novel inverse rendering network incorporates physical insights -- including a spatially-varying spherical Gaussian lighting representation, a differentiable rendering layer to model scene appearance, a cascade structure to iteratively refine the predictions and a bilateral solver for refinement -- allowing us to jointly reason about shape, lighting, and reflectance. Since no existing dataset provides ground truth high quality spatially-varying material and spatially-varying lighting, we propose novel methods to map complex materials to existing indoor scene datasets and a new physically-based GPU renderer to create a large-scale, photorealistic indoor dataset. Experiments show that our framework outperforms previous methods and enables various novel applications like photorealistic object insertion and material editing.

[1]  Kalyan Sunkavalli,et al.  Fast Spatially-Varying Indoor Lighting Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[3]  Jaakko Lehtinen,et al.  Differentiable Monte Carlo ray tracing through edge sampling , 2018, ACM Trans. Graph..

[4]  Zen-Chung Shih,et al.  All-frequency precomputed radiance transfer using spherical radial basis functions and clustered tensor approximation , 2006, ACM Trans. Graph..

[5]  G. Stiny Shape , 1999 .

[6]  Ronen Basri,et al.  Lambertian reflectance and linear subspaces , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[7]  Shi-Min Hu,et al.  Anisotropic spherical Gaussians , 2013, ACM Trans. Graph..

[8]  Amnon Shashua,et al.  The Quotient Image: Class-Based Re-Rendering and Recognition with Varying Illuminations , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Ersin Yumer,et al.  Material Editing Using a Physically Based Rendering Network , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Tim Weyrich,et al.  Texture Stationarization: Turning Photos into Tileable Textures , 2017, Comput. Graph. Forum.

[11]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jonathan T. Barron,et al.  The Fast Bilateral Solver , 2015, ECCV.

[13]  Jan Kautz,et al.  Neural Inverse Rendering of an Indoor Scene From a Single Image , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Yoshihiro Kanamori,et al.  Relighting humans , 2018, ACM Trans. Graph..

[15]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zhengqi Li,et al.  CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering , 2018, ECCV.

[17]  Adrien Bousseau,et al.  Single-image SVBRDF capture with a rendering-aware deep network , 2018, ACM Trans. Graph..

[18]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[19]  Kalyan Sunkavalli,et al.  Learning to reconstruct shape and spatially-varying reflectance from a single image , 2018, ACM Trans. Graph..

[20]  Steve Marschner,et al.  Inverse Lighting for Photography , 1997, CIC.

[21]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[22]  Thomas Vetter,et al.  A morphable model for the synthesis of 3D faces , 1999, SIGGRAPH.

[23]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[24]  Brian Karis,et al.  Real Shading in Unreal Engine 4 by , 2013 .

[25]  Edward H. Adelson,et al.  The perception of shading and reflectance , 1996 .

[26]  Frédo Durand,et al.  Efficient Reflectance and Visibility Approximations for Environment Map Rendering , 2007, Comput. Graph. Forum.

[27]  Yannick Hold-Geoffroy,et al.  Deep Sky Modeling for Single Image Outdoor Lighting Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bui Tuong Phong Illumination for computer generated pictures , 1975, Commun. ACM.

[29]  Thomas Funkhouser,et al.  Neural Illumination: Lighting Prediction for Indoor Environments , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Irfan A. Essa,et al.  Graphcut textures: image and video synthesis using graph cuts , 2003, ACM Trans. Graph..

[31]  Baining Guo,et al.  Real-time texture synthesis by patch-based sampling , 2001, TOGS.

[32]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[33]  Kalyan Sunkavalli,et al.  Automatic Scene Inference for 3D Object Compositing , 2014, ACM Trans. Graph..

[34]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[35]  Luc Van Gool,et al.  What is Around the Camera? , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Ersin Yumer,et al.  Learning to predict indoor illumination from a single image , 2017, ACM Trans. Graph..

[37]  Todd E. Zickler,et al.  Blind Reflectometry , 2010, ECCV.

[38]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Olivier D. Faugeras,et al.  Shape From Shading , 2006, Handbook of Mathematical Models in Computer Vision.

[40]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[41]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[42]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  Vladlen Koltun,et al.  Photographic Image Synthesis with Cascaded Refinement Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Patrick Pérez,et al.  MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Pat Hanrahan,et al.  An efficient representation for irradiance environment maps , 2001, SIGGRAPH.

[46]  David W. Jacobs,et al.  GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[47]  V. Kwata Graphcut Textures : Image and Video Synthesis Using Graph , 2003, SIGGRAPH 2003.

[48]  Matthias Nießner,et al.  Inverse Path Tracing for Joint Material and Lighting Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Ko Nishino,et al.  Shape and Reflectance from Natural Illumination , 2012, ECCV.

[50]  Edward H. Adelson,et al.  Shape estimation in natural illumination , 2011, CVPR 2011.

[51]  Hans-Peter Seidel,et al.  LIME: Live Intrinsic Material Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Matthias Nießner,et al.  ScanNet: Richly-Annotated 3D Reconstructions of Indoor Scenes , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Yannick Hold-Geoffroy,et al.  Deep Outdoor Illumination Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Frédo Durand,et al.  Experimental analysis of BRDF models , 2005, EGSR '05.

[57]  Ravi Ramamoorthi,et al.  Connecting measured BRDFs to analytic BRDFs by data-driven diffuse-specular separation , 2018, ACM Trans. Graph..

[58]  Shuang Zhao,et al.  Inverse Transport Networks , 2018, ArXiv.

[59]  Ko Nishino,et al.  Reflectance and Natural Illumination from a Single Image , 2012, ECCV.

[60]  Kalyan Sunkavalli,et al.  Materials for Masses: SVBRDF Acquisition with a Single Mobile Phone Image , 2018, ECCV.

[61]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Adam Finkelstein,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, SIGGRAPH 2009.

[63]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Matthias Nießner,et al.  Matterport3D: Learning from RGB-D Data in Indoor Environments , 2017, 2017 International Conference on 3D Vision (3DV).