CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering

Intrinsic image decomposition is a challenging, long-standing computer vision problem for which ground truth data is very difficult to acquire. We explore the use of synthetic data for training CNN-based intrinsic image decomposition models, then applying these learned models to real-world images. To that end, we present CGIntrinsics, a new, large-scale dataset of physically-based rendered images of scenes with full ground truth decompositions. The rendering process we use is carefully designed to yield high-quality, realistic images, which we find to be crucial for this problem domain. We also propose a new end-to-end training method that learns better decompositions by leveraging CGIntrinsics, and optionally IIW and SAW, two recent datasets of sparse annotations on real-world images. Surprisingly, we find that a decomposition network trained solely on our synthetic data outperforms the state-of-the-art on both IIW and SAW, and performance improves even further when IIW and SAW data is added during training. Our work demonstrates the suprising effectiveness of carefully-rendered synthetic data for the intrinsic images task.

[1]  Adolfo Muñoz,et al.  Intrinsic Images by Clustering , 2012, Comput. Graph. Forum.

[2]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[4]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[5]  Balazs Kovacs,et al.  Shading Annotations in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jonathan T. Barron,et al.  Fast bilateral-space stereo for synthetic defocus , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Balazs Kovacs,et al.  Intrinsic Decompositions for Image Editing , 2017, Comput. Graph. Forum.

[8]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[9]  Stella X. Yu,et al.  Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yizhou Yu,et al.  An L1 image transform for edge-preserving smoothing and scene-level intrinsic decomposition , 2015, ACM Trans. Graph..

[13]  Leonidas J. Guibas,et al.  ShapeNet: An Information-Rich 3D Model Repository , 2015, ArXiv.

[14]  Peter V. Gehler,et al.  Reflectance Adaptive Filtering Improves Intrinsic Image Estimation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[16]  Zhengqi Li,et al.  Learning Intrinsic Image Decomposition from Watching the World , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  E. Land,et al.  Lightness and retinex theory. , 1971, Journal of the Optical Society of America.

[18]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[19]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[23]  Stephen Lin,et al.  A Closed-Form Solution to Retinex with Nonlocal Texture Constraints , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Vladlen Koltun,et al.  A Simple Model for Intrinsic Image Decomposition with Depth Cues , 2013, 2013 IEEE International Conference on Computer Vision.

[25]  Jian Shi,et al.  Learning Non-Lambertian Object Intrinsics Across ShapeNet Categories , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Alexei A. Efros,et al.  Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Joost van de Weijer,et al.  Intrinsic image evaluation on synthetic complex scenes , 2013, 2013 IEEE International Conference on Image Processing.

[28]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Vladlen Koltun,et al.  Playing for Benchmarks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Stephen Lin,et al.  Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields , 2016, ECCV.

[31]  Peter V. Gehler,et al.  Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance , 2011, NIPS.

[32]  Seungyong Lee,et al.  Intrinsic Image Decomposition Using Structure-Texture Separation and Surface Normals , 2014, ECCV.

[33]  Ersin Yumer,et al.  Neural Face Editing with Intrinsic Image Disentangling , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Chuohao Yeo,et al.  Intrinsic images decomposition using a local and global sparse representation of reflectance , 2011, CVPR 2011.

[35]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[36]  Erik Reinhard,et al.  Photographic tone reproduction for digital images , 2002, ACM Trans. Graph..

[37]  Ersin Yumer,et al.  Physically-Based Rendering for Indoor Scene Understanding Using Convolutional Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  William T. Freeman,et al.  Learning Ordinal Relationships for Mid-Level Vision , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Thomas A. Funkhouser,et al.  Semantic Scene Completion from a Single Depth Image , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Stella X. Yu,et al.  Learning lightness from human judgement on relative reflectance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Jiajun Wu,et al.  Self-Supervised Intrinsic Image Decomposition , 2017, NIPS.