Arnet: Attention-Based Refinement Network for Few-Shot Semantic Segmentation

Semantic segmentation is a challenging task for computer vision which aims to classify the objects from the pixel level. Previous methods based on deep learning have made some progress but the labeling work is very time-consuming. Few-shot semantic segmentation can alleviate this problem. In this paper, we propose an Attention-based Refinement Network (ARNet) for few-shot semantic segmentation, which consists of three branches: the guidance branch, the segmentation branch and the refinement branch. The Residual Attention Module (RAM) can highlight the features from segmentation branch, giving a better guidance to refinement brach. And the Parallel Dilated Convolution Module (PDCM) in the end of refinement branch can refine the segmentation results. Experiments on PASCAL VOC 2012 dataset show that our model achieves a mean Intersection-over-Union (mIoU) score of 48.1% for one-shot segmentation and 49.1% for five-shot segmentation, outperforming state-of-the-art methods by 1.8% and 2.0%, respectively.

[1]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[2]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[4]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[5]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Guosheng Lin,et al.  RefineNet: Multi-Path Refinement Networks for Dense Prediction , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Qinmin Hu,et al.  Enhancing Recurrent Neural Networks with Positional Attention for Question Answering , 2017, SIGIR.

[9]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[11]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[12]  Alexei A. Efros,et al.  Conditional Networks for Few-Shot Semantic Segmentation , 2018, ICLR.

[13]  Nassir Navab,et al.  'Squeeze & Excite' Guided Few-Shot Segmentation of Volumetric Images , 2019, Medical Image Anal..

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Yi Yang,et al.  SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation , 2018, IEEE Transactions on Cybernetics.

[16]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Mennatullah Siam,et al.  Adaptive Masked Weight Imprinting for Few-Shot Segmentation , 2019, ArXiv.

[19]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[20]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.