论文信息 - FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation

Over the past few years, we have witnessed the success of deep learning in image recognition thanks to the availability of large-scale human-annotated datasets such as PASCAL VOC, ImageNet, and COCO. Although these datasets have covered a wide range of object categories, there are still a significant number of objects that are not included. Can we perform the same task without a lot of human annotations? In this paper, we are interested in few-shot object segmentation where the number of annotated training examples are limited to 5 only. To evaluate and validate the performance of our approach, we have built a few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our dataset contains significant number of objects that have never been seen or annotated in previous datasets, such as tiny daily objects, merchandise, cartoon characters, logos, etc. We build our baseline model using standard backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise, we found that training our model from scratch using FSS-1000 achieves comparable and even better results than training with weights pre-trained by ImageNet which is more than 100 times larger than FSS-1000. Both our approach and dataset are simple, effective, and easily extensible to learn segmentation of new object classes given very few annotated training examples. Dataset is available at https://github.com/HKUSTCV/FSS-1000

[1] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[2] Pietro Perona,et al. One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] Tao Xiang,et al. Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Ronan Collobert,et al. Recurrent Convolutional Neural Networks for Scene Labeling , 2014, ICML.

[5] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[6] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.

[7] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[9] Oriol Vinyals,et al. Matching Networks for One Shot Learning , 2016, NIPS.

[10] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Krista A. Ehinger,et al. SUN Database: Exploring a Large Collection of Scene Categories , 2014, International Journal of Computer Vision.

[12] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Jian Sun,et al. Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Hong Yu,et al. Meta Networks , 2017, ICML.

[16] Bartunov Sergey,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[17] Yi Li,et al. Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19] Jiashi Feng,et al. PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[21] Luca Maria Gambardella,et al. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[22] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[23] Hugo Larochelle,et al. Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[24] Matthias Bethge,et al. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[25] Shawn D. Newsam,et al. Bag-of-visual-words and spatial extensions for land-use classification , 2010, GIS '10.

[26] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27] Luca Bertinetto,et al. Learning feed-forward one-shot learners , 2016, NIPS.

[28] Andrew Zisserman,et al. Counting in the Wild , 2016, ECCV.

[29] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30] Rui Yao,et al. CANet: Class-Agnostic Segmentation Networks With Iterative Refinement and Attentive Few-Shot Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Daan Wierstra,et al. Meta-Learning with Memory-Augmented Neural Networks , 2016, ICML.

[32] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[33] Richard S. Zemel,et al. Prototypical Networks for Few-shot Learning , 2017, NIPS.

[34] Sergey Levine,et al. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[35] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[36] Joshua B. Tenenbaum,et al. Human-level concept learning through probabilistic program induction , 2015, Science.

[37] Byron Boots,et al. One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[38] Kaiming He,et al. Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39] Alexei A. Efros,et al. Few-Shot Segmentation Propagation with Guided Networks , 2018, ArXiv.

[40] Gregory R. Koch,et al. Siamese Neural Networks for One-Shot Image Recognition , 2015 .