A Self-Distillation Embedded Supervised Affinity Attention Model for Few-Shot Segmentation

Few-shot semantic segmentation is a challenging task of predicting object categories in pixel-wise with only few annotated samples. However, existing approaches still face two main challenges. First, huge feature distinction between support and query images causes knowledge transferring barrier, which harms the segmentation performance. Second, few support samples cause unrepresentative of support features, hardly to guide high-quality query segmentation. To deal with the above two issues, we propose self-distillation embedded supervised affinity attention model (SD-AANet) to improve the performance of fewshot segmentation task. Specifically, the self-distillation guided prototype module (SDPM) extracts intrinsic prototype by selfdistillation between support and query to capture representative features. The supervised affinity attention module (SAAM) adopts support ground truth to guide the production of high quality query attention map, which can learn affinity information to focus on whole area of query target. Extensive experiments prove that our SD-AANet significantly improves the performance comparing with existing methods. Comprehensive ablation experiments and visualization studies also show the significant effect of SDPM and SAAM for few-shot segmentation task. On benchmark datasets, PASCAL-5 and COCO-20, our proposed SD-AANet both achieve state-of-the-art results. Our code will be publicly available soon.

[1]  Nikos Komodakis,et al.  Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer , 2016, ICLR.

[2]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[3]  Varun Jampani,et al.  Adaptive Prototype Learning and Allocation for Few-Shot Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Luping Zhou,et al.  BriNet: Towards Bridging the Intra-class and Inter-class Gaps in One-Shot Segmentation , 2020, BMVC.

[5]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Kaisheng Ma,et al.  Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[9]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[10]  Alexei A. Efros,et al.  Conditional Networks for Few-Shot Semantic Segmentation , 2018, ICLR.

[11]  Hengshuang Zhao,et al.  Prior Guided Feature Enrichment Network for Few-Shot Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Yi Yang,et al.  SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation , 2018, IEEE Transactions on Cybernetics.

[13]  Chi Zhang,et al.  Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  Guangliang Cheng,et al.  Embedded Knowledge Distillation in Depth-level Dynamic Neural Network , 2021, ArXiv.

[15]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[16]  Jiashi Feng,et al.  PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[17]  Jimin Xiao,et al.  Self-Guided and Cross-Guided Learning for Few-Shot Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Xiantong Zhen,et al.  Few-Shot Semantic Segmentation with Democratic Attention Networks , 2020, ECCV.

[19]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[21]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[22]  Khoi Nguyen,et al.  Feature Weighting and Boosting for Few-Shot Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[24]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Gang Yu,et al.  Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation , 2019, AAAI.

[26]  Guosheng Lin,et al.  CRNet: Cross-Reference Networks for Few-Shot Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Yunchao Wei,et al.  CCNet: Criss-Cross Attention for Semantic Segmentation. , 2020, IEEE transactions on pattern analysis and machine intelligence.

[28]  Qixiang Ye,et al.  Prototype Mixture Models for Few-shot Semantic Segmentation , 2020, ECCV.

[29]  Xuming He,et al.  Part-aware Prototype Network for Few-shot Semantic Segmentation , 2020, ECCV.

[30]  Changming Sun,et al.  Knowledge Adaptation for Efficient Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Rui Yao,et al.  CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[33]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[34]  Bhuvana Ramabhadran,et al.  Efficient Knowledge Distillation from an Ensemble of Teachers , 2017, INTERSPEECH.

[35]  Yi Zhang,et al.  PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[36]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[37]  Martin Jägersand,et al.  AMP: Adaptive Masked Proxies for Few-Shot Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[38]  Chen Gong,et al.  SAL:Selection and Attention Losses for Weakly Supervised Semantic Segmentation , 2020, IEEE Transactions on Multimedia.

[39]  Sangheum Hwang,et al.  Self-Knowledge Distillation: A Simple Way for Better Generalization , 2020, ArXiv.

[40]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[41]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Hugo Larochelle,et al.  Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples , 2019, ICLR.

[43]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Siddhartha Gairola,et al.  SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation , 2020, IJCAI.

[47]  Eric P. Xing,et al.  Few-Shot Semantic Segmentation with Prototype Learning , 2018, BMVC.

[48]  Jianfei Cai,et al.  Decoupled Spatial Neural Attention for Weakly Supervised Semantic Segmentation , 2018, IEEE Transactions on Multimedia.

[49]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[50]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[51]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[52]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.