What's the Point: Semantic Segmentation with Point Supervision

The semantic image segmentation task presents a trade-off between test time accuracy and training time annotation cost. Detailed per-pixel annotations enable training accurate models but are very time-consuming to obtain; image-level class labels are an order of magnitude cheaper but result in less accurate models. We take a natural step from image-level annotation towards stronger supervision: we ask annotators to point to an object if one exists. We incorporate this point supervision along with a novel objectness potential in the training loss function of a CNN model. Experimental results on the PASCAL VOC 2012 benchmark reveal that the combined effect of point-level supervision and objectness potential yields an improvement of \(12.9\,\%\) mIOU over image-level supervision. Further, we demonstrate that models trained with point-level supervision are more accurate than models trained with image-level, squiggle-level or full supervision given a fixed annotation budget.

[1]  Kazunobu Yoshida,et al.  Object recognition via recognition of finger pointing actions , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[2]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[3]  H. H. Clark Coordinating with each other in a material world , 2005 .

[4]  Antonio Criminisi,et al.  TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[5]  Pattie Maes,et al.  Augmenting Looking, Pointing and Reaching Gestures to Enhance the Searching and Browsing of Physical Objects , 2007, Pervasive.

[6]  Toby Sharp,et al.  Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Cristian Sminchisescu,et al.  Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[12]  Joachim M. Buhmann,et al.  Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[13]  Andrew Zisserman,et al.  BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Thomas Deselaers,et al.  Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Stephen Gould,et al.  Multiclass pixel labeling with non-local matching constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Hao Su,et al.  Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[18]  Joachim M. Buhmann,et al.  Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Kristen Grauman,et al.  Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[21]  Michelle R. Greene Statistics of high-level scene context , 2013, Front. Psychol..

[22]  Matthieu Guillaumin,et al.  ImageNet Auto-Annotation with Segmentation Propagation , 2014, International Journal of Computer Vision.

[23]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[25]  Ejaz Ahmed,et al.  Semantic Object Selection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Allison Sauppé,et al.  Robot Deictics: How Gesture and Context Shape Referential Communication , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27]  Zaïd Harchaoui,et al.  On learning to localize objects with minimal supervision , 2014, ICML.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30]  Bo Han,et al.  TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[31]  Jia Xu,et al.  Tell Me What You See and I Will Show You Where It Is , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  B. Scholl,et al.  “Please Tap the Shape, Anywhere You Like” , 2014, Psychological science.

[33]  Frank Keller,et al.  Training Object Class Detectors from Eye Tracking Data , 2014, ECCV.

[34]  Chong Wang,et al.  Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[35]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Fei-Fei Li,et al.  Best of both worlds: Human-machine collaboration for object annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[38]  Seunghoon Hong,et al.  Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[39]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[44]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47]  Jia Xu,et al.  Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[49]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.