Image-level to Pixel-wise Labeling: From Theory to Practice

Conventional convolutional neural networks (CNNs) have achieved great success in image semantic segmentation. Existing methods mainly focus on learning pixel-wise labels from an image directly. In this paper, we advocate tackling the pixel-wise segmentation problem by considering the image-level classification labels. Theoretically, we analyze and discuss the effects of image-level labels on pixel-wise segmentation from the perspective of information theory. In practice, an end-to-end segmentation model is built by fusing the image-level and pixel-wise labeling networks. A generative network is included to reconstruct the input image and further boost the segmentation model training with an auxiliary loss. Extensive experimental results on benchmark dataset demonstrate the effectiveness of the proposed method, where good image-level labels can significantly improve the pixel-wise segmentation accuracy.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Stefano Soatto,et al.  Class segmentation and object localization with superpixel neighborhoods , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[3]  Lin Ma,et al.  Learning to Answer Questions from Image Using Convolutional Neural Network , 2015, AAAI.

[4]  Yang Wang,et al.  Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation , 2016, ISVC.

[5]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[6]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[9]  Xiaogang Wang,et al.  Deep Dual Learning for Semantic Image Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[13]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[14]  Shuicheng Yan,et al.  Multi-Path Feedback Recurrent Neural Networks for Scene Parsing , 2016, AAAI.

[15]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Xiaodong Yu,et al.  Learning Bidirectional Temporal Cues for Video-Based Person Re-Identification , 2018, IEEE Transactions on Circuits and Systems for Video Technology.

[19]  Xin Li,et al.  FoveaNet: Perspective-Aware Urban Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Shuicheng Yan,et al.  Tree-Structured Reinforcement Learning for Sequential Object Localization , 2016, NIPS.

[21]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[23]  Jun-Hai Yong,et al.  Fully Convolutional Neural Networks with Full-Scale-Features for Semantic Segmentation , 2017, AAAI.

[24]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).