论文信息 - STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation

STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation

Recently, significant improvement has been made on semantic object segmentation due to the development of deep convolutional neural networks (DCNNs). Training such a DCNN usually relies on a large number of images with pixel-level segmentation masks, and annotating these images is very costly in terms of both finance and human effort. In this paper, we propose a simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation. Specifically, we first train an initial segmentation network called Initial-DCNN with the saliency maps of simple images (i.e., those with a single category of major object(s) and clean background). These saliency maps can be automatically obtained by existing bottom-up salient object detection techniques, where no supervision information is needed. Then, a better network called Enhanced-DCNN is learned with supervision from the predicted segmentation masks of simple images based on the Initial-DCNN as well as the image-level annotations. Finally, more pixel-level segmentation masks of complex images (two or more categories of objects with cluttered background), which are inferred by using Enhanced-DCNN and image-level annotations, are utilized as the supervision information to learn the Powerful-DCNN for semantic segmentation. Our method utilizes 40K simple images from Flickr.com and 10K complex images from PASCAL VOC for step-wisely boosting the segmentation network. Extensive experimental results on PASCAL VOC 2012 segmentation benchmark well demonstrate the superiority of the proposed STC framework compared with other state-of-the-arts.

[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2] Jia Xu,et al. Tell Me What You See and I Will Show You Where It Is , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Lars Petersson,et al. Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation , 2016, ECCV.

[5] Philip H. S. Torr,et al. BING: Binarized normed gradients for objectness estimation at 300fps , 2014, Computational Visual Media.

[6] Antonio Criminisi,et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[7] Fei-Fei Li,et al. Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[8] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[9] Trevor Darrell,et al. Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[10] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.

[11] Wataru Shimoda,et al. Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation , 2016, ECCV.

[12] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[13] Bingbing Ni,et al. HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Joachim M. Buhmann,et al. Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.

[16] Shi-Min Hu,et al. Global contrast based salient region detection , 2011, CVPR 2011.

[17] Jia Xu,et al. Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Yunchao Wei,et al. Computational Baby Learning , 2014, ArXiv.

[19] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[21] Li Xu,et al. Hierarchical Saliency Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Li Xu,et al. Hierarchical Image Saliency Detection on Extended CSSD , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Jonathan T. Barron,et al. Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[25] Changsheng Xu,et al. Weakly Supervised Graph Propagation Towards Collective Image Parsing , 2012, IEEE Transactions on Multimedia.

[26] Jian Sun,et al. Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27] Jian Sun,et al. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28] Shi-Min Hu,et al. Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[29] Christof Koch,et al. A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[30] Jingdong Wang,et al. Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[31] Shi-Min Hu,et al. SalientShape: group saliency in image collections , 2013, The Visual Computer.

[32] George Papandreou,et al. Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[33] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34] William T. Freeman,et al. Annotation Propagation in Large Image Databases via Dense Image Correspondence , 2012, ECCV.

[35] Jian Dong,et al. Semantic Segmentation without Annotating Segments , 2013, 2013 IEEE International Conference on Computer Vision.

[36] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37] Trevor Darrell,et al. Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[38] Vladimir Kolmogorov,et al. "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[39] Christoph H. Lampert,et al. Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[40] Luc Van Gool,et al. The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[41] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[43] Ali Borji,et al. Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[44] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45] Vibhav Vineet,et al. Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46] Yunchao Wei,et al. Towards Computational Baby Learning: A Weakly-Supervised Approach for Object Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47] Antonio Torralba,et al. Nonparametric scene parsing: Label transfer via dense scene alignment , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[49] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[50] Shiguang Shan,et al. Self-Paced Learning with Diversity , 2014, NIPS.