Single Image Intrinsic Decomposition Without a Single Intrinsic Image

Intrinsic image decomposition—decomposing a natural image into a set of images corresponding to different physical causes—is one of the key and fundamental problems of computer vision. Previous intrinsic decomposition approaches either address the problem in a fully supervised manner, or require multiple images of the same scene as input. These approaches are less desirable in practice, as ground truth intrinsic images are extremely difficult to acquire, and requirement of multiple images pose severe limitation on applicable scenarios. In this paper, we propose to bring the best of both worlds. We present a two stream convolutional neural network framework that is capable of learning the decomposition effectively in the absence of any ground truth intrinsic images, and can be easily extended to a (semi-)supervised setup. At inference time, our model can be easily reduced to a single stream module that performs intrinsic decomposition on a single input image. We demonstrate the effectiveness of our framework through extensive experimental study on both synthetic and real-world datasets, showing superior performance over previous approaches in both single-image and multi-image settings. Notably, our approach outperforms previous state-of-the-art single image methods while using only 50% of ground truth supervision.

[1]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[2]  Jiajun Wu,et al.  Self-Supervised Intrinsic Image Decomposition , 2017, NIPS.

[3]  Frédo Durand,et al.  A gentle introduction to bilateral filtering and its applications , 2007, SIGGRAPH Courses.

[4]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.

[5]  Max Jaderberg,et al.  Unsupervised Learning of 3D Structure from Images , 2016, NIPS.

[6]  Adrien Bousseau,et al.  Rich Intrinsic Image Decomposition of Outdoor Scenes from Multiple Views , 2013, IEEE Trans. Vis. Comput. Graph..

[7]  Kristen Grauman,et al.  Learning Image Representations Tied to Ego-Motion , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Mark S. Drew,et al.  Removing Shadows From Images using Retinex , 2002, CIC.

[9]  Katerina Fragkiadaki,et al.  Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Noah Snavely,et al.  Reasoning about Photo Collections using Models of Outdoor Illumination , 2014, BMVC.

[11]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[12]  Oisin Mac Aodha,et al.  Unsupervised Monocular Depth Estimation with Left-Right Consistency , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Stephen Lin,et al.  Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields , 2016, ECCV.

[15]  William T. Freeman,et al.  Learning Ordinal Relationships for Mid-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Hans-Peter Seidel,et al.  LIME: Live Intrinsic Material Estimation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  H. Barrow,et al.  RECOVERING INTRINSIC SCENE CHARACTERISTICS FROM IMAGES , 1978 .

[18]  Xuelong Li,et al.  Intrinsic images using optimization , 2011, CVPR 2011.

[19]  Christian Theobalt,et al.  Live Intrinsic Material Estimation , 2018, CVPR 2018.

[20]  Seungyong Lee,et al.  Intrinsic Image Decomposition Using Structure-Texture Separation and Surface Normals , 2014, ECCV.

[21]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, CVPR.

[22]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[24]  Christian Theobalt,et al.  Live intrinsic video , 2016, ACM Trans. Graph..

[25]  Yair Weiss,et al.  Deriving intrinsic images from image sequences , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[26]  Aswin C. Sankaranarayanan,et al.  White balance under mixed illumination using flash photography , 2016, 2016 IEEE International Conference on Computational Photography (ICCP).

[27]  Noah Snavely,et al.  Photometric Ambient Occlusion , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  W. Freeman,et al.  Learning local evidence for shading and reflectance , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  Vladlen Koltun,et al.  A Simple Model for Intrinsic Image Decomposition with Depth Cues , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Jian Shi,et al.  Learning Non-Lambertian Object Intrinsics Across ShapeNet Categories , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Y. J. Tejwani,et al.  Robot vision , 1989, IEEE International Symposium on Circuits and Systems,.

[32]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[33]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Gregory Shakhnarovich,et al.  Learning Representations for Automatic Colorization , 2016, ECCV.

[35]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Chuang Gan,et al.  The Sound of Pixels , 2018, ECCV.

[38]  Lei Jiang,et al.  Statistical Invariance for Texture Synthesis , 2012, IEEE Transactions on Visualization and Computer Graphics.

[39]  Scott E. Reed,et al.  Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis , 2015, NIPS.

[40]  Zhengqi Li,et al.  Learning Intrinsic Image Decomposition from Watching the World , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Edward H. Adelson,et al.  The perception of shading and reflectance , 1996 .

[42]  Noah Snavely,et al.  Unsupervised Learning of Depth and Ego-Motion from Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Stella X. Yu,et al.  Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Alexei A. Efros,et al.  Colorful Image Colorization , 2016, ECCV.

[45]  Katsushi Ikeuchi,et al.  Illumination normalization with time-dependent intrinsic images for video surveillance , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pierre-Yves Laffont,et al.  Intrinsic Decomposition of Image Sequences from Local Temporal Variations , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  M. Werman,et al.  Color lines: image specific color representation , 2004, CVPR 2004.

[48]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[49]  E. Land,et al.  Lightness and retinex theory. , 1971, Journal of the Optical Society of America.

[50]  David A. Forsyth,et al.  Rendering synthetic objects into legacy photographs , 2011, ACM Trans. Graph..

[51]  Ersin Yumer,et al.  Self-supervised Learning of Motion Capture , 2017, NIPS.

[52]  Honglak Lee,et al.  Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision , 2016, NIPS.

[53]  Alexei A. Efros,et al.  Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[54]  Cordelia Schmid,et al.  SfM-Net: Learning of Structure and Motion from Video , 2017, ArXiv.

[55]  Edward H. Adelson,et al.  Recovering intrinsic images from a single image , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[57]  Michael J. Black,et al.  Intrinsic Depth: Improving Depth Transfer with Intrinsic Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[58]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[59]  英樹 藤堂,et al.  Interactive intrinsic video editing , 2014, ACM Trans. Graph..

[60]  Mark S. Drew,et al.  Removing Shadows from Images , 2002, ECCV.

[61]  Peter V. Gehler,et al.  Intrinsic Video , 2014, ECCV.

[62]  Peter V. Gehler,et al.  Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance , 2011, NIPS.