Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation

Despite the remarkable progress, weakly supervised segmentation approaches are still inferior to their fully supervised counterparts. We obverse the performance gap mainly comes from their limitation on learning to produce high-quality dense object localization maps from image-level supervision. To mitigate such a gap, we revisit the dilated convolution [1] and reveal how it can be utilized in a novel way to effectively overcome this critical limitation of weakly supervised segmentation approaches. Specifically, we find that varying dilation rates can effectively enlarge the receptive fields of convolutional kernels and more importantly transfer the surrounding discriminative information to non-discriminative object regions, promoting the emergence of these regions in the object localization maps. Then, we design a generic classification network equipped with convolutional blocks of different dilated rates. It can produce dense and reliable object localization maps and effectively benefit both weakly- and semi- supervised semantic segmentation. Despite the apparent simplicity, our proposed approach obtains superior performance over state-of-the-arts. In particular, it achieves 60.8% and 67.6% mIoU scores on Pascal VOC 2012 test set in weakly- (only image-level labels are available) and semi- (1,464 segmentation masks are available) supervised settings, which are the new state-of-the-arts.

[1]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[2]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Lars Petersson,et al.  Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation , 2016, ECCV.

[4]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[6]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[9]  Seunghoon Hong,et al.  Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[12]  Yunchao Wei,et al.  Computational Baby Learning , 2014, ArXiv.

[13]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Dong Xu,et al.  SPFTN: A Self-Paced Fine-Tuning Network for Segmenting Objects in Weakly Labelled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Xiaojuan Qi,et al.  Augmented Feedback in Semantic Segmentation Under Image Level Supervision , 2016, ECCV.

[16]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[18]  Seunghoon Hong,et al.  Weakly Supervised Semantic Segmentation Using Web-Crawled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Dahun Kim,et al.  Two-Phase Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[20]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[21]  Changhu Wang,et al.  Surveillance Video Parsing with Single Frame Supervision , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Shih-Fu Chang,et al.  PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Shih-Fu Chang,et al.  Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yunchao Wei,et al.  Learning to Segment Human by Watching YouTube , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Yao Zhao,et al.  Learning to segment with image-level annotations , 2016, Pattern Recognit..

[28]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Yunchao Wei,et al.  Bottom-Up Top-Down Cues for Weakly-Supervised Semantic Segmentation , 2016, EMMCVPR.

[30]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[31]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Abhinav Gupta,et al.  Temporal Dynamic Graph LSTM for Action-Driven Video Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Lars Petersson,et al.  Incorporating Network Built-in Priors in Weakly-Supervised Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[36]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[37]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[38]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Sinisa Todorovic,et al.  Combining Bottom-Up, Top-Down, and Smoothness Cues for Weakly Supervised Image Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Wataru Shimoda,et al.  Distinct Class-Specific Saliency Maps for Weakly Supervised Semantic Segmentation , 2016, ECCV.

[43]  Yang Yang,et al.  Learning Category-Specific 3D Shape Models from Weakly Labeled 2D Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[45]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).