Coarse-to-Fine Semantic Segmentation From Image-Level Labels

Deep neural network-based semantic segmentation generally requires large-scale cost extensive annotations for training to obtain better performance. To avoid pixel-wise segmentation annotations that are needed for most methods, recently some researchers attempted to use object-level labels (e.g., bounding boxes) or image-level labels (e.g., image categories). In this paper, we propose a novel recursive coarse-to-fine semantic segmentation framework based on only image-level category labels. For each image, an initial coarse mask is first generated by a convolutional neural network-based unsupervised foreground segmentation model and then is enhanced by a graph model. The enhanced coarse mask is fed to a fully convolutional neural network to be recursively refined. Unlike the existing image-level label-based semantic segmentation methods, which require labeling of all categories for images that contain multiple types of objects, our framework only needs one label for each image and can handle images that contain multi-category objects. Only trained on ImageNet, our framework achieves comparable performance on the PASCAL VOC dataset with other image-level label-based state-of-the-art methods of semantic segmentation. Furthermore, our framework can be easily extended to foreground object segmentation task and achieves comparable performance with the state-of-the-art supervised methods on the Internet object dataset.

[1]  Wataru Shimoda,et al.  Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation , 2016, ECCV.

[2]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[4]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation. , 2017, IEEE transactions on pattern analysis and machine intelligence.

[6]  Mubarak Shah,et al.  Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network , 2017, ArXiv.

[7]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ce Liu,et al.  Unsupervised Joint Object Discovery and Segmentation in Internet Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Emanuela Haller,et al.  Unsupervised Object Segmentation in Video by Efficient Selection of Highly Probable Positive Features , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Wenyu Liu,et al.  Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Marius Leordeanu,et al.  Unsupervised Learning from Video to Detect Foreground Objects in Single Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Xinlei Chen,et al.  Enriching Visual Knowledge Bases via Object Discovery and Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Kristen Grauman,et al.  Pixel Objectness , 2017, ArXiv.

[19]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kristen Grauman,et al.  Active Image Segmentation Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Sabine Süsstrunk,et al.  Webly Supervised Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[24]  Seunghoon Hong,et al.  Weakly Supervised Semantic Segmentation Using Web-Crawled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Takeo Kanade,et al.  Distributed cosegmentation via submodular optimization on anisotropic diffusion , 2011, 2011 International Conference on Computer Vision.

[26]  Marius Leordeanu,et al.  Multiple Frames Matching for Object Discovery in Video , 2015, BMVC.

[27]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[29]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[30]  Stan Sclaroff,et al.  Saliency Detection: A Boolean Map Approach , 2013, 2013 IEEE International Conference on Computer Vision.

[31]  Ralph R. Martin,et al.  Associating Inter-image Salient Instances for Weakly Supervised Semantic Segmentation , 2018, ECCV.

[32]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[33]  Guosheng Lin,et al.  Weakly Supervised Semantic Segmentation Based on Web Image Co-segmentation , 2017, 1705.09052.

[34]  Jianfei Cai,et al.  Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation , 2018, IEEE Transactions on Multimedia.

[35]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Jean Ponce,et al.  Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[39]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Seunghoon Hong,et al.  Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[42]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiaojuan Qi,et al.  Augmented Feedback in Semantic Segmentation Under Image Level Supervision , 2016, ECCV.

[45]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[47]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[48]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[49]  Huimin Ma,et al.  Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Huchuan Lu,et al.  Saliency Detection via Absorbing Markov Chain , 2013, 2013 IEEE International Conference on Computer Vision.

[52]  Ian D. Reid,et al.  Bootstrapping the Performance of Webly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[53]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[56]  Suha Kwak,et al.  Learning Pixel-Level Semantic Affinity with Image-Level Supervision for Weakly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Lars Petersson,et al.  Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation , 2016, ECCV.

[58]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.