GLoSH: Global-Local Spherical Harmonics for Intrinsic Image Decomposition

Traditional intrinsic image decomposition focuses on decomposing images into reflectance and shading, leaving surfaces normals and lighting entangled in shading. In this work, we propose a Global-Local Spherical Harmonics (GLoSH) lighting model to improve the lighting component, and jointly predict reflectance and surface normals. The global SH models the holistic lighting while local SH account for the spatial variation of lighting. Also, a novel non-negative lighting constraint is proposed to encourage the estimated SH to be physically meaningful. To seamlessly reflect the GLoSH model, we design a coarse-to-fine network structure. The coarse network predicts global SH, reflectance and normals, and the fine network predicts their local residuals. Lacking labels for reflectance and lighting, we apply synthetic data for model pre-training and fine-tune the model with real data in a self-supervised way. Compared to the state-of-the-art methods only targeting normals or reflectance and shading, our method recovers all components and achieves consistently better results on three real datasets, IIW, SAW and NYUv2.

[1]  P. Hanrahan,et al.  On the relationship between radiance and irradiance: determining the illumination from images of a convex Lambertian object. , 2001, Journal of the Optical Society of America. A, Optics, image science, and vision.

[2]  Stephen Lin,et al.  A Closed-form Solution to Retinex with Non-local Texture Constraints , 2012 .

[3]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jean-Denis Durou,et al.  Numerical methods for shape-from-shading: A new survey with benchmarks , 2008, Comput. Vis. Image Underst..

[5]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6]  Peter V. Gehler,et al.  Reflectance Adaptive Filtering Improves Intrinsic Image Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jiajun Wu,et al.  Self-Supervised Intrinsic Image Decomposition , 2017, NIPS.

[8]  Chengyi Zhang,et al.  Intrinsic Image Transformation via Scale Space Decomposition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Alexei A. Efros,et al.  Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, CVPR.

[11]  Zhengqi Li,et al.  Learning Intrinsic Image Decomposition from Watching the World , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[12]  Stephen Lin,et al.  A Closed-Form Solution to Retinex with Nonlocal Texture Constraints , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[14]  Stella X. Yu,et al.  Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Jiaolong Yang,et al.  Revisiting Deep Intrinsic Image Decompositions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  David J. Kriegman,et al.  The Bas-Relief Ambiguity , 2004, International Journal of Computer Vision.

[17]  Ping-Sing Tsai,et al.  Shape from Shading: A Survey , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Zhengqi Li,et al.  CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering , 2018, ECCV.

[19]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[20]  David W. Jacobs,et al.  Non-negative lighting and specular object recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Peter V. Gehler,et al.  Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance , 2011, NIPS.

[22]  Seungyong Lee,et al.  Intrinsic Image Decomposition Using Structure-Texture Separation and Surface Normals , 2014, ECCV.

[23]  Vladlen Koltun,et al.  A Simple Model for Intrinsic Image Decomposition with Depth Cues , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[25]  Balazs Kovacs,et al.  Shading Annotations in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Shi,et al.  Learning Non-Lambertian Object Intrinsics Across ShapeNet Categories , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Adolfo Muñoz,et al.  Intrinsic Images by Clustering , 2012, Comput. Graph. Forum.

[28]  Ronen Basri,et al.  Lambertian Reflectance and Linear Subspaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[29]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[30]  Stephen Lin,et al.  Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields , 2016, ECCV.

[31]  Michael Goesele,et al.  A Survey of Photometric Stereo Techniques , 2015, Found. Trends Comput. Graph. Vis..

[32]  Jonathan T. Barron,et al.  Scene Intrinsics and Depth from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[33]  Yaser Yacoob,et al.  Label Denoising Adversarial Network (LDAN) for Inverse Lighting of Faces , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[35]  Bolei Zhou,et al.  Single Image Intrinsic Decomposition Without a Single Intrinsic Image , 2018, ECCV.

[36]  Yizhou Yu,et al.  An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition , 2015, ACM Trans. Graph..

[37]  Robert J. Woodham,et al.  Photometric method for determining surface orientation from multiple images , 1980 .

[38]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[39]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[40]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  圭右 小澤,et al.  5分で分かる ! ? 有名論文ナナメ読み:Robert J. Woodham : Photometric Method for Determining Surface Orientation from Multiple Images , 2020 .

[43]  William T. Freeman,et al.  Learning Ordinal Relationships for Mid-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Stella X. Yu,et al.  Learning lightness from human judgement on relative reflectance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.