SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation

One-shot image semantic segmentation poses a challenging task of recognizing the object regions from unseen categories with only one annotated example as supervision. In this article, we propose a simple yet effective similarity guidance network to tackle the one-shot (SG-One) segmentation problem. We aim at predicting the segmentation mask of a query image with the reference to one densely labeled support image of the same category. To obtain the robust representative feature of the support image, we first adopt a masked average pooling strategy for producing the guidance features by only taking the pixels belonging to the support image into account. We then leverage the cosine similarity to build the relationship between the guidance features and features of pixels from the query image. In this way, the possibilities embedded in the produced similarity maps can be adopted to guide the process of segmenting objects. Furthermore, our SG-One is a unified framework that can efficiently process both support and query images within one network and be learned in an end-to-end manner. We conduct extensive experiments on Pascal VOC 2012. In particular, our SG-One achieves the mIoU score of 46.3%, surpassing the baseline methods.

[1]  Qiang Ji,et al.  Weakly Supervised Facial Action Unit Recognition With Domain Knowledge , 2018, IEEE Transactions on Cybernetics.

[2]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Chenggang Yan,et al.  Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks , 2020, IEEE Transactions on Cybernetics.

[7]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[8]  Sheng Tang,et al.  Boundary Perception Guidance: A Scribble-Supervised Semantic Segmentation Approach , 2019, IJCAI.

[9]  Yunchao Wei,et al.  Self-Erasing Network for Integral Object Attention , 2018, NeurIPS.

[10]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[11]  Alexei A. Efros,et al.  Conditional Networks for Few-Shot Semantic Segmentation , 2018, ICLR.

[12]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Qiang Wu,et al.  Coupled Bilinear Discriminant Projection for Cross-View Gait Recognition , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Xia Li,et al.  Weakly Supervised Salient Object Detection With Spatiotemporal Cascade Neural Networks , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Yuri Boykov,et al.  Normalized Cut Loss for Weakly-Supervised CNN Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[17]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[18]  Yi Yang,et al.  Self-produced Guidance for Weakly-supervised Object Localization , 2018, ECCV.

[19]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[20]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jinjun Xiong,et al.  SPGNet: Semantic Prediction Guidance for Scene Parsing , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[24]  Alexander G. Schwing,et al.  VideoMatch: Matching based Video Object Segmentation , 2018, ECCV.

[25]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[26]  Yu Wu,et al.  Progressive Learning for Person Re-Identification With One Example , 2019, IEEE Transactions on Image Processing.

[27]  Ehsan Adeli,et al.  3-D Fully Convolutional Networks for Multimodal Isointense Infant Brain Image Segmentation , 2019, IEEE Transactions on Cybernetics.

[28]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[29]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[30]  Deyu Meng,et al.  Instance Annotation via Optimal BoW for Weakly Supervised Object Localization , 2017, IEEE Transactions on Cybernetics.

[31]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[32]  Kalyan Sunkavalli,et al.  Fast Video Object Segmentation by Reference-Guided Mask Propagation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Soma Biswas,et al.  Preserving Semantic Relations for Zero-Shot Learning , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jun-Sik Kim,et al.  Pixel-Level Matching for Video Object Segmentation Using Convolutional Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Xuelong Li,et al.  A Hybrid Level Set With Semantic Shape Constraint for Object Segmentation , 2019, IEEE Transactions on Cybernetics.

[38]  Michal Irani,et al.  Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[41]  Eric P. Xing,et al.  Few-Shot Semantic Segmentation with Prototype Learning , 2018, BMVC.

[42]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Thomas Brox,et al.  FlowNet: Learning Optical Flow with Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Yunchao Wei,et al.  Integral Object Mining via Online Attention Accumulation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[45]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[47]  Yi Yang,et al.  Adversarial Complementary Learning for Weakly Supervised Object Localization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[49]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[50]  Yunchao Wei,et al.  Weakly Supervised Scene Parsing with Point-based Distance Metric Learning , 2018, AAAI.

[51]  Weisi Lin,et al.  Scale and Orientation Invariant Text Segmentation for Born-Digital Compound Images , 2015, IEEE Transactions on Cybernetics.

[52]  Carme Torras,et al.  Consistent Depth Video Segmentation Using Adaptive Surface Models , 2015, IEEE Transactions on Cybernetics.

[53]  Xuelong Li,et al.  High-Order Energies for Stereo Segmentation , 2016, IEEE Transactions on Cybernetics.

[54]  Dinggang Shen,et al.  Weakly Supervised Deep Learning for Brain Disease Prognosis Using MRI and Incomplete Clinical Scores , 2020, IEEE Transactions on Cybernetics.

[55]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).