Relighting Images in the Wild with a Self-Supervised Siamese Auto-Encoder

We propose a self-supervised method for image relighting of single view images in the wild. The method is based on an auto-encoder which deconstructs an image into two separate encodings, relating to the scene illumination and content, respectively. In order to disentangle this embedding information without supervision, we exploit the assumption that some augmentation operations do not affect the image content and only affect the direction of the light. A novel loss function, called spherical harmonic loss, is introduced that forces the illumination embedding to convert to a spherical harmonic vector. We train our model on large-scale datasets such as Youtube 8M and CelebA. Our experiments show that our method can correctly estimate scene illumination and realistically re-light input images, without any supervision or a prior shape model. Compared to supervised methods, our approach has similar performance and avoids common lighting artifacts.

[1]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Carlos D. Castillo,et al.  SfSNet: Learning Shape, Reflectance and Illuminance of Faces 'in the Wild' , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Xueting Li,et al.  A Closed-form Solution to Photorealistic Image Stylization , 2018, ECCV.

[4]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[5]  Jonghyun Choi,et al.  Structured Set Matching Networks for One-Shot Part Labeling , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[7]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[8]  Luca Bertinetto,et al.  Fully-Convolutional Siamese Networks for Object Tracking , 2016, ECCV Workshops.

[9]  Ye Yu,et al.  Self-supervised Outdoor Scene Relighting , 2021, ECCV.

[10]  David W. Jacobs,et al.  Deep Single-Image Portrait Relighting , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Alexei A. Efros,et al.  Learning to Factorize and Relight a City , 2020, ECCV.

[13]  Stefan Leutenegger,et al.  ElasticFusion: Real-time dense SLAM and light source estimation , 2016, Int. J. Robotics Res..

[14]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[15]  George Drettakis,et al.  Multi-view relighting using a geometry-aware network , 2019, ACM Trans. Graph..

[16]  Yun-Ta Tsai,et al.  Single image portrait relighting , 2019, ACM Trans. Graph..

[17]  Jan Kautz,et al.  Learning Linear Transformations for Fast Image and Video Style Transfer , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Sylvain Paris,et al.  Portrait lighting transfer using a mass transport approach , 2017, TOGS.

[19]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[22]  Alexandre Pierre Dherse,et al.  Scene relighting with illumination estimation in the latent space on an encoder-decoder scheme , 2020, ArXiv.

[23]  Michael F. Cohen,et al.  Emptying, refurnishing, and relighting indoor spaces , 2016, ACM Trans. Graph..

[24]  Frédo Durand,et al.  Style transfer for headshot portraits , 2014, ACM Trans. Graph..

[25]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[26]  Apostol Natsev,et al.  YouTube-8M: A Large-Scale Video Classification Benchmark , 2016, ArXiv.

[27]  Adrien Bousseau,et al.  Multiview Intrinsic Images of Outdoors Scenes with an Application to Relighting , 2015, ACM Trans. Graph..

[28]  Stephan J. Garbin,et al.  CONFIG: Controllable Neural Face Image Generation , 2020, ECCV.

[29]  L. M. M.-T. Spherical Harmonics: an Elementary Treatise on Harmonic Functions, with Applications , 1928, Nature.

[30]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .