Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation

Acquiring sufficient ground-truth supervision to train deep visual models has been a bottleneck over the years due to the data-hungry nature of deep learning. This is exacerbated in some structured prediction tasks, such as semantic segmentation, which requires pixel-level annotations. This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths, which can be used for training more accurate segmentation models. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes, and the underlying relations between a pair of images are characterized by an efficient co-attention mechanism. Moreover, in order to prevent the model from paying excessive attention to common semantics only, we further propose a graph dropout layer, encouraging the model to learn more accurate and complete object responses. The whole network is end-to-end trainable by iterative message passing, which propagates interaction cues over the images to progressively improve the performance. We conduct experiments on the popular PASCAL VOC 2012 and COCO benchmarks, and our model yields state-of-the-art performance. Our code is available at: this https URL.

[1]  Huchuan Lu,et al.  Joint Learning of Saliency Detection and Weakly Supervised Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[2]  Wenyu Liu,et al.  Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[4]  Keiji Yanai,et al.  Self-Supervised Difference Detection for Weakly-Supervised Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[6]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Yunchao Wei,et al.  Integral Object Mining via Online Attention Accumulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[9]  Ralph R. Martin,et al.  Associating Inter-image Salient Instances for Weakly Supervised Semantic Segmentation , 2018, ECCV.

[10]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Ling Shao,et al.  Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Ling Shao,et al.  Learning Compositional Neural Information Fusion for Human Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Suha Kwak,et al.  Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  L. Gool,et al.  Video Object Segmentation with Episodic Graph Memory Networks , 2020, ECCV.

[16]  Yoshua Bengio,et al.  GMNN: Graph Markov Neural Networks , 2019, ICML.

[17]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[18]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Yizhou Yu,et al.  Multi-evidence Filtering and Fusion for Multi-label Classification, Object Detection and Semantic Segmentation Based on Weakly Supervised Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[21]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[22]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[23]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[24]  Sungroh Yoon,et al.  FickleNet: Weakly and Semi-Supervised Semantic Image Segmentation Using Stochastic Inference , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Hyunjung Shim,et al.  Attention-Based Dropout Layer for Weakly Supervised Object Localization , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Abhinav Gupta,et al.  The More You Know: Using Knowledge Graphs for Image Classification , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Luc Van Gool,et al.  Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation , 2020, ECCV.

[30]  Ling Shao,et al.  Hierarchical Human Parsing With Typed Part-Relation Reasoning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Paul Vernaza,et al.  Learning Random-Walk Label Propagation for Weakly-Supervised Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Xiaojuan Qi,et al.  Augmented Feedback in Semantic Segmentation Under Image Level Supervision , 2016, ECCV.

[34]  Christopher Joseph Pal,et al.  Delving Deeper into Convolutional Networks for Learning Video Representations , 2015, ICLR.

[35]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[36]  Huimin Ma,et al.  Weakly-Supervised Semantic Segmentation by Iterative Affinity Learning , 2020, International Journal of Computer Vision.

[37]  Huimin Ma,et al.  Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[38]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Tieniu Tan,et al.  Learning Integral Objects With Intra-Class Discriminator for Weakly-Supervised Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Luc Van Gool,et al.  Exploring Cross-Image Pixel Contrast for Semantic Segmentation , 2021, ArXiv.

[42]  Suha Kwak,et al.  Learning Pixel-Level Semantic Affinity with Image-Level Supervision for Weakly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Sanja Fidler,et al.  3D Graph Neural Networks for RGBD Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[44]  Philip H. S. Torr,et al.  Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation , 2017, BMVC.

[45]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[46]  Song-Chun Zhu,et al.  Cascaded Parsing of Human-Object Interaction Recognition , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Ling Shao,et al.  Contrast-Attentive Thoracic Disease Recognition With Dual-Weighting Graph Reasoning , 2021, IEEE Transactions on Medical Imaging.

[48]  Yun Fu,et al.  Tell Me Where to Look: Guided Attention Inference Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[49]  Lars Petersson,et al.  Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation , 2016, ECCV.

[50]  Tieniu Tan,et al.  CIAN: Cross-Image Affinity Net for Weakly Supervised Semantic Segmentation , 2018, AAAI.

[51]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Stefan Roth,et al.  Single-Stage Semantic Segmentation From Image Labels , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Ling Shao,et al.  Motion-Attentive Transition for Zero-Shot Video Object Segmentation , 2020, AAAI.

[54]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[56]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[57]  Song-Chun Zhu,et al.  Hierarchical Human Semantic Parsing With Comprehensive Part-Relation Modeling , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Yan Huang,et al.  Box-Driven Class-Wise Region Masking and Filling Rate Guided Loss for Weakly Supervised Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Xilin Chen,et al.  Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Qiaosong Wang,et al.  Weakly-Supervised Semantic Segmentation via Sub-Category Exploration , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Xiao Han,et al.  Weakly Supervised Semantic Segmentation with Boundary Exploration , 2020, ECCV.

[63]  Mingjie Sun,et al.  Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach , 2019, AAAI.

[64]  Jianbing Shen,et al.  MATNet: Motion-Attentive Transition Network for Zero-Shot Video Object Segmentation , 2020, IEEE Transactions on Image Processing.

[65]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[66]  Yunchao Wei,et al.  Self-Erasing Network for Integral Object Attention , 2018, NeurIPS.

[67]  Tieniu Tan,et al.  Employing Multi-estimations for Weakly-Supervised Semantic Segmentation , 2020, ECCV.