Deep Extreme Cut: From Extreme Points to Object Segmentation

This paper explores the use of extreme points in an object (left-most, right-most, top, bottom pixels) as input to obtain precise object segmentation for images and videos. We do so by adding an extra channel to the image in the input of a convolutional neural network (CNN), which contains a Gaussian centered in each of the extreme points. The CNN learns to transform this information into a segmentation of an object that matches those extreme points. We demonstrate the usefulness of this approach for guided segmentation (grabcut-style), interactive segmentation, video object segmentation, and dense segmentation annotation. We show that we obtain the most precise results to date, also with less user input, in an extensive and varied selection of benchmarks and datasets. All our models and code are publicly available on http://www.vision.ee.ethz.ch/~cvlsegmentation/dextr/.

[1]  Marie-Pierre Jolly,et al.  Interactive Graph Cuts for Optimal Boundary and Region Segmentation of Objects in N-D Images , 2001, ICCV.

[2]  Ismail Ben Ayed,et al.  Secrets of GrabCut and Kernel K-Means , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Hao Su,et al.  Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[4]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[5]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[10]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ning Xu,et al.  Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[13]  Guillermo Sapiro,et al.  Geodesic Matting: A Framework for Fast Interactive Image and Video Segmentation and Matting , 2009, International Journal of Computer Vision.

[14]  Ning Xu,et al.  Deep GrabCut for Object Selection , 2017, BMVC.

[15]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[18]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ronan Collobert,et al.  Learning to Refine Object Segments , 2016, ECCV.

[21]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Zhuowen Tu,et al.  MILCut: A Sweeping Line Multiple Instance Learning Paradigm for Interactive Image Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[24]  Ejaz Ahmed,et al.  Semantic Object Selection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Kristen Grauman,et al.  Click Carving: Segmenting Objects in Video with Point Clicks , 2016, HCOMP.

[26]  Frank Keller,et al.  Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[30]  Kai Chen,et al.  Video Object Segmentation with Re-identification , 2017, ArXiv.

[31]  Leo Grady,et al.  Random Walks for Image Segmentation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Lena Gorelick,et al.  GrabCut in One Cut , 2013, 2013 IEEE International Conference on Computer Vision.

[33]  Sabine Süsstrunk,et al.  Webly Supervised Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Frank Keller,et al.  Training Object Class Detectors with Click Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Sim Heng Ong,et al.  Regional Interactive Image Segmentation Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[38]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[39]  Jordi Pont-Tuset,et al.  Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.