Variational Prototype Inference for Few-Shot Semantic Segmentation

In this paper, we propose variational prototype inference to address few-shot semantic segmentation in a probabilistic framework. A probabilistic latent variable model infers the distribution of the prototype that is treated as the latent variable. We formulate the optimization as a variational inference problem, which is established with an amortized inference network based on an auto-encoder architecture. The probabilistic modeling of the prototype enhances its generalization ability to handle the inherent uncertainty caused by limited data and the huge intra-class variations of objects. Moreover, it offers a principled way to incorporate the prototype extracted from support images into the prediction of the segmentation maps for query images. We conduct extensive experimental evaluations on three benchmark datasets. Ablation studies show the effectiveness of variational prototype inference for few-shot semantic segmentation by probabilistic modeling. On all three benchmarks, our proposal achieves high segmentation accuracy and surpasses previous methods by considerable margins.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[3]  Rui Yao,et al.  CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[7]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[8]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[9]  Chi-Keung Tang,et al.  FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Klaus H. Maier-Hein,et al.  A Probabilistic U-Net for Segmentation of Ambiguous Images , 2018, NeurIPS.

[13]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[14]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Alexander S. Ecker,et al.  One-Shot Segmentation in Clutter , 2018, ICML.

[16]  Yi Yang,et al.  SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation , 2018, IEEE Transactions on Cybernetics.

[17]  Martin Jägersand,et al.  AMP: Adaptive Masked Proxies for Few-Shot Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Chi Zhang,et al.  Pyramid Graph Networks With Connection Attentions for Region-Based One-Shot Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Eric P. Xing,et al.  Few-Shot Semantic Segmentation with Prototype Learning , 2018, BMVC.

[20]  Jiashi Feng,et al.  PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Cees G. M. Snoek,et al.  Learning to Learn Variational Semantic Memory , 2020, NeurIPS.

[22]  Gang Yu,et al.  Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation , 2019, AAAI.

[23]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Sergey Levine,et al.  Probabilistic Model-Agnostic Meta-Learning , 2018, NeurIPS.

[25]  Alexei A. Efros,et al.  Conditional Networks for Few-Shot Semantic Segmentation , 2018, ICLR.

[26]  Bingbing Ni,et al.  Variational Few-Shot Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[28]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[29]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[30]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[31]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[33]  Khoi Nguyen,et al.  Feature Weighting and Boosting for Few-Shot Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Alexei A. Efros,et al.  Few-Shot Segmentation Propagation with Guided Networks , 2018, ArXiv.

[35]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.