Annotation-Free and One-Shot Learning for Instance Segmentation of Homogeneous Object Clusters

We propose a novel approach for instance segmen- tation given an image of homogeneous object clus- ter (HOC). Our learning approach is one-shot be- cause a single video of an object instance is cap- tured and it requires no human annotation. Our in- tuition is that images of homogeneous objects can be effectively synthesized based on structure and illumination priors derived from real images. A novel solver is proposed that iteratively maximizes our structured likelihood to generate realistic im- ages of HOC. Illumination transformation scheme is applied to make the real and synthetic images share the same illumination condition. Extensive experiments and comparisons are performed to ver- ify our method. We build a dataset consisting of pixel-level annotated images of HOC. The dataset and code will be published with the paper.

[1]  Donald R. Jones,et al.  Efficient Global Optimization of Expensive Black-Box Functions , 1998, J. Glob. Optim..

[2]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[3]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Giulio Sandini,et al.  2nd European conference on computer vision , 1992, Image Vis. Comput..

[6]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Hao Su,et al.  Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[14]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Chi-Keung Tang,et al.  KNN Matting , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Roberto Cipolla,et al.  SynthCam3D: Semantic Understanding With Synthetic Indoor Scenes , 2015, ArXiv.

[18]  F J W-M Leong,et al.  Correction of uneven illumination (vignetting) in digital microscopy images , 2003, Journal of clinical pathology.

[19]  Leon Sixt,et al.  RenderGAN: Generating Realistic Labeled Data , 2016, Front. Robot. AI.

[20]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[23]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[24]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[25]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Leonidas J. Guibas,et al.  Render for CNN: Viewpoint Estimation in Images Using CNNs Trained with Rendered 3D Model Views , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[27]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[28]  Markus Schoeler,et al.  Semantic Pose Using Deep Networks Trained on Synthetic RGB-D , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[30]  Kate Saenko,et al.  Learning Deep Object Detectors from 3D Models , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.