One-Shot Segmentation in Clutter

We tackle the problem of one-shot segmentation: finding and segmenting a previously unseen object in a cluttered scene based on a single instruction example. We propose a novel dataset, which we call $\textit{cluttered Omniglot}$. Using a baseline architecture combining a Siamese embedding for detection with a U-net for segmentation we show that increasing levels of clutter make the task progressively harder. Using oracle models with access to various amounts of ground-truth information, we evaluate different aspects of the problem and show that in this kind of visual search task, detection and segmentation are two intertwined problems, the solution to each of which helps solving the other. We therefore introduce $\textit{MaskNet}$, an improved model that attends to multiple candidate locations, generates segmentation proposals to mask out background clutter and selects among the segmented objects. Our findings suggest that such image recognition models based on an iterative refinement of object detection and foreground segmentation may provide a way to deal with highly cluttered scenes.

[1]  Abhishek Sharma,et al.  One Shot Joint Colocalization and Cosegmentation , 2017, ArXiv.

[2]  Gregory R. Koch,et al.  Siamese Neural Networks for One-Shot Image Recognition , 2015 .

[3]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Ambedkar Dukkipati,et al.  Attentive Recurrent Comparators , 2017, ICML.

[5]  Wei Li,et al.  WebVision Challenge: Visual Learning and Understanding With Web Data , 2017, ArXiv.

[6]  Yannis Avrithis,et al.  Unsupervised deep object discovery for instance recognition , 2017, ArXiv.

[7]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[8]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[13]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Ian D. Reid,et al.  RefineNet : MultiPath Refinement Networks with Identity Mappings for High-Resolution Semantic Segmentation , 2016 .

[16]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Bo Zhao,et al.  Clothing Cosegmentation for Shopping Images With Cluttered Background , 2016, IEEE Transactions on Multimedia.

[19]  Byron Boots,et al.  One-Shot Learning for Semantic Segmentation , 2017, BMVC.

[20]  Frank Keller,et al.  Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Bolei Zhou,et al.  Scene Parsing through ADE20K Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Michal Irani,et al.  Co-segmentation by Composition , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Luca Bertinetto,et al.  Learning feed-forward one-shot learners , 2016, NIPS.

[24]  Yang Liu,et al.  Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[28]  한보형,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015 .

[29]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[31]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[32]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[33]  Feiping Nie,et al.  Object Co-segmentation via Graph Optimized-Flexible Manifold Ranking , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  U. Neisser VISUAL SEARCH. , 1964, Scientific American.

[35]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[38]  Raquel Urtasun,et al.  Few-Shot Learning Through an Information Retrieval Lens , 2017, NIPS.