Joint Learning of Intrinsic Images and Semantic Segmentation

Semantic segmentation of outdoor scenes is problematic when there are variations in imaging conditions. It is known that albedo (reflectance) is invariant to all kinds of illumination effects. Thus, using reflectance images for semantic segmentation task can be favorable. Additionally, not only segmentation may benefit from reflectance, but also segmentation may be useful for reflectance computation. Therefore, in this paper, the tasks of semantic segmentation and intrinsic image decomposition are considered as a combined process by exploring their mutual relationship in a joint fashion. To that end, we propose a supervised end-to-end CNN architecture to jointly learn intrinsic image decomposition and semantic segmentation. We analyze the gains of addressing those two problems jointly. Moreover, new cascade CNN architectures for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as single tasks. Furthermore, a dataset of 35K synthetic images of natural environments is created with corresponding albedo and shading (intrinsics), as well as semantic labels (segmentation) assigned to each object/scene. The experiments show that joint learning of intrinsic image decomposition and semantic segmentation is beneficial for both tasks for natural scenes. Dataset and models are available at: (https://ivi.fnwi.uva.nl/cv/intrinseg).

[1]  Carsten Rother,et al.  Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation , 2013, NIPS.

[2]  Luc Van Gool,et al.  DARN: a Deep Adversial Residual Network for Intrinsic Image Decomposition , 2016, ArXiv.

[3]  E. Land,et al.  Lightness and retinex theory. , 1971, Journal of the Optical Society of America.

[4]  Stephen Lin,et al.  A Closed-Form Solution to Retinex with Nonlocal Texture Constraints , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[6]  Paul Newman,et al.  Lighting invariant urban street classification , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Jason Weber,et al.  Creation and rendering of realistic trees , 1995, SIGGRAPH.

[8]  Rishi Ramakrishnan,et al.  Shadow compensation for outdoor perception , 2015, 2015 IEEE International Conference on Robotics and Automation (ICRA).

[9]  Robert B. Fisher,et al.  The Second Workshop on 3D Reconstruction Meets Semantics: Challenge Results Discussion , 2018, ECCV Workshops.

[10]  Jana Kosecka,et al.  Joint Semantic Segmentation and Depth Estimation with Deep Convolutional Networks , 2016, 2016 Fourth International Conference on 3D Vision (3DV).

[11]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Steven A. Shafer,et al.  Using color to separate reflection components , 1985 .

[14]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[15]  Chuohao Yeo,et al.  Intrinsic images decomposition using a local and global sparse representation of reflectance , 2011, CVPR 2011.

[16]  Michael Ying Yang,et al.  Analyzing modular CNN architectures for joint depth prediction and semantic segmentation , 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[17]  E. J. van Henten,et al.  Shadow-resistant segmentation based on illumination invariant image transformation , 2014 .

[18]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[21]  José García Rodríguez,et al.  A survey on deep learning techniques for image and video semantic segmentation , 2018, Appl. Soft Comput..

[22]  Rob Fergus,et al.  Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Stephen Lin,et al.  Unified Depth Prediction and Intrinsic Image Decomposition from a Single Image via Joint Convolutional Neural Fields , 2016, ECCV.

[24]  W. F. Clocksin,et al.  Joint Optimization for Object Class Segmentation and Dense Stereo Reconstruction , 2012, International Journal of Computer Vision.

[25]  Jian Shi,et al.  Learning Non-Lambertian Object Intrinsics Across ShapeNet Categories , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Stephen Lin,et al.  Estimating Intrinsic Images from Image Sequences with Biased Illumination , 2004, ECCV.

[27]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[28]  James M. Rehg,et al.  Joint Semantic Segmentation and 3D Reconstruction from Monocular Video , 2014, ECCV.

[29]  Edward H. Adelson,et al.  Ground truth dataset and baseline evaluations for intrinsic image algorithms , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[31]  Stephen Lin,et al.  Intrinsic image decomposition with non-local texture cues , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Xiangjun Zou,et al.  A robust fruit image segmentation algorithm against varying illumination for vision system of fruit harvesting robot , 2017 .

[34]  Yair Weiss,et al.  Deriving intrinsic images from image sequences , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[35]  Jonathan T. Barron,et al.  Scene Intrinsics and Depth from a Single Image , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[36]  Peter V. Gehler,et al.  Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance , 2011, NIPS.

[37]  Michael J. Black,et al.  A Naturalistic Open Source Movie for Optical Flow Evaluation , 2012, ECCV.

[38]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[39]  Theo Gevers,et al.  CNN Based Learning Using Reflection and Retinex Models for Intrinsic Image Decomposition , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Jitendra Malik,et al.  Color Constancy, Intrinsic Images, and Shape Estimation , 2012, ECCV.

[42]  Gabriela Csurka,et al.  An Efficient Approach to Semantic Segmentation , 2011, International Journal of Computer Vision.

[43]  Stella X. Yu,et al.  Direct Intrinsics: Learning Albedo-Shading Decomposition by Convolutional Regression , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).