Webly Supervised Semantic Segmentation

We propose a weakly supervised semantic segmentation algorithm that uses image tags for supervision. We apply the tags in queries to collect three sets of web images, which encode the clean foregrounds, the common backgrounds, and realistic scenes of the classes. We introduce a novel three-stage training pipeline to progressively learn semantic segmentation models. We first train and refine a class-specific shallow neural network to obtain segmentation masks for each class. The shallow neural networks of all classes are then assembled into one deep convolutional neural network for end-to-end training and testing. Experiments show that our method notably outperforms previous state-of-the-art weakly supervised semantic segmentation approaches on the PASCAL VOC 2012 segmentation benchmark. We further apply the class-specific shallow neural networks to object segmentation and obtain excellent results.

[1]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[2]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  B. S. Manjunath,et al.  Weakly Supervised Graph Based Semantic Segmentation by Learning Communities of Image-Parts , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  C. Schmid,et al.  Hamming Embedding and Weak Geometry Consistency for Large Scale Image Search - extended version , 2008 .

[7]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[8]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[18]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Ejaz Ahmed,et al.  Semantic Object Selection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Guosheng Lin,et al.  Efficient Piecewise Training of Deep Structured Models for Semantic Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Wataru Shimoda,et al.  Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation , 2016, ECCV.

[22]  Xinlei Chen,et al.  Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[25]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[27]  Lars Petersson,et al.  Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation , 2016, ECCV.

[28]  Yi Liu,et al.  Large-scale image annotation using visual synset , 2011, 2011 International Conference on Computer Vision.

[29]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[31]  Larry S. Davis,et al.  G-CNN: An Iterative Grid Based Object Detector , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Thomas Hofmann,et al.  Support Vector Machines for Multiple-Instance Learning , 2002, NIPS.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  Jianbo Shi,et al.  Semantic Segmentation with Boundary Neural Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[36]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Andrea Vedaldi,et al.  Learning the semantic structure of objects from Web supervision , 2016, ArXiv.

[38]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[39]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Xiaojuan Qi,et al.  Augmented Feedback in Semantic Segmentation Under Image Level Supervision , 2016, ECCV.

[41]  Yao Zhao,et al.  Learning to segment with image-level annotations , 2016, Pattern Recognit..

[42]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[43]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.