Deep GrabCut for Object Selection

Most previous bounding-box-based segmentation methods assume the bounding box tightly covers the object of interest. However it is common that a rectangle input could be too large or too small. In this paper, we propose a novel segmentation approach that uses a rectangle as a soft constraint by transforming it into an Euclidean distance map. A convolutional encoder-decoder network is trained end-to-end by concatenating images with these distance maps as inputs and predicting the object masks as outputs. Our approach gets accurate segmentation results given sloppy rectangles while being general for both interactive segmentation and instance segmentation. We show our network extends to curve-based input without retraining. We further apply our network to instance-level semantic segmentation and resolve any overlap using a conditional random field. Experiments on benchmark datasets demonstrate the effectiveness of the proposed approaches.

[1]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3]  Zhuowen Tu,et al.  MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[5]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[6]  Hong Zhang,et al.  Multi-scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Lena Gorelick,et al.  GrabCut in One Cut , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  Vladimir Kolmogorov,et al.  Joint optimization of segmentation and appearance models , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Ming-Hsuan Yang,et al.  Multi-instance object segmentation with occlusion handling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Jitendra Malik,et al.  Iterative Instance Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[13]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Ismail Ben Ayed,et al.  Secrets of GrabCut and Kernel K-Means , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[20]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ning Xu,et al.  Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[24]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[25]  Yunchao Wei,et al.  Reversible Recursive Instance-Level Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Ismail Ben Ayed,et al.  Volumetric Bias in Segmentation and Reconstruction: Secrets and Solutions , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).