Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression

We introduce a new approach to intrinsic image decomposition, the task of decomposing a single image into albedo and shading components. Our strategy, which we term direct intrinsics, is to learn a convolutional neural network (CNN) that directly predicts output albedo and shading channels from an input RGB image patch. Direct intrinsics is a departure from classical techniques for intrinsic image decomposition, which typically rely on physically-motivated priors and graph-based inference algorithms. The large-scale synthetic ground-truth of the MPI Sintel dataset plays the key role in training direct intrinsics. We demonstrate results on both the synthetic images of Sintel and the real images of the classic MIT intrinsic image dataset. On Sintel, direct intrinsics, using only RGB input, outperforms all prior work, including methods that rely on RGB+Depth input. Direct intrinsics also generalizes across modalities, our Sintel-trained CNN produces quite reasonable decompositions on the real images of the MIT dataset. Our results indicate that the marriage of CNNs with synthetic training data may be a powerful new technique for tackling classic problems in computer vision.

[1]  Rob Fergus,et al.  Depth Map Prediction from a Single Image using a Multi-Scale Deep Network , 2014, NIPS.

[2]  Chuohao Yeo,et al.  Intrinsic images decomposition using a local and global sparse representation of reflectance , 2011, CVPR 2011.

[3]  E. Land,et al.  Lightness and retinex theory. , 1971, Journal of the Optical Society of America.

[4]  Adolfo Muñoz,et al.  Intrinsic Images by Clustering , 2012, Comput. Graph. Forum.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Peter V. Gehler,et al.  Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance , 2011, NIPS.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Geoffrey E. Hinton,et al.  Deep Lambertian Networks , 2012, ICML.

[11]  Jitendra Malik,et al.  Intrinsic Scene Properties from a Single RGB-D Image , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Pietro Perona,et al.  Reconstructive Sparse Code Transfer for Contour Detection and Semantic Labeling , 2014, ACCV.

[13]  Abhinav Gupta,et al.  Designing deep networks for surface normal estimation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  David A. Forsyth,et al.  Non-parametric Filtering for Geometric Detail Extraction and Material Representation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[17]  Michael Werman,et al.  Color lines: image specific color representation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[18]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[19]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[20]  Stella X. Yu,et al.  Angular embedding: From jarring intensity differences to perceived luminance , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Jitendra Malik,et al.  Shape, Illumination, and Reflectance from Shading , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Stephen Lin,et al.  Estimation of Intrinsic Image Sequences from Image+Depth Video , 2012, ECCV.

[23]  Stella X. Yu,et al.  Learning lightness from human judgement on relative reflectance , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[25]  K. Hohn,et al.  Determining Lightness from an Image , 2004 .

[26]  Vladlen Koltun,et al.  A Simple Model for Intrinsic Image Decomposition with Depth Cues , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Alexei A. Efros,et al.  Learning Data-Driven Reflectance Priors for Intrinsic Image Decomposition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Ian D. Reid,et al.  Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.