论文信息 - What's the Point: Semantic Segmentation with Point Supervision

What's the Point: Semantic Segmentation with Point Supervision

The semantic image segmentation task presents a trade-off between test time accuracy and training time annotation cost. Detailed per-pixel annotations enable training accurate models but are very time-consuming to obtain; image-level class labels are an order of magnitude cheaper but result in less accurate models. We take a natural step from image-level annotation towards stronger supervision: we ask annotators to point to an object if one exists. We incorporate this point supervision along with a novel objectness potential in the training loss function of a CNN model. Experimental results on the PASCAL VOC 2012 benchmark reveal that the combined effect of point-level supervision and objectness potential yields an improvement of \(12.9\,\%\) mIOU over image-level supervision. Further, we demonstrate that models trained with point-level supervision are more accurate than models trained with image-level, squiggle-level or full supervision given a fixed annotation budget.

[1] Kazunobu Yoshida,et al. Object recognition via recognition of finger pointing actions , 2003, 12th International Conference on Image Analysis and Processing, 2003.Proceedings..

[2] Andrew Blake,et al. "GrabCut" , 2004, ACM Trans. Graph..

[3] H. H. Clark. Coordinating with each other in a material world , 2005 .

[4] Antonio Criminisi,et al. TextonBoost for Image Understanding: Multi-Class Object Recognition and Segmentation by Jointly Modeling Texture, Layout, and Context , 2007, International Journal of Computer Vision.

[5] Pattie Maes,et al. Augmenting Looking, Pointing and Reaching Gestures to Enhance the Searching and Browsing of Physical Objects , 2007, Pervasive.

[6] Toby Sharp,et al. Image segmentation with a bounding box prior , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[8] Jean Ponce,et al. Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9] Cristian Sminchisescu,et al. Constrained parametric min-cuts for automatic object segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10] Jiebo Luo,et al. iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Subhransu Maji,et al. Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[12] Joachim M. Buhmann,et al. Weakly supervised semantic segmentation with a multi-image model , 2011, 2011 International Conference on Computer Vision.

[13] Andrew Zisserman,et al. BiCoS: A Bi-level co-segmentation method for image classification , 2011, 2011 International Conference on Computer Vision.

[14] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15] Thomas Deselaers,et al. Measuring the Objectness of Image Windows , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16] Stephen Gould,et al. Multiclass pixel labeling with non-local matching constraints , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Hao Su,et al. Crowdsourcing Annotations for Visual Object Detection , 2012, HCOMP@AAAI.

[18] Joachim M. Buhmann,et al. Weakly supervised structured output learning for semantic segmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Camille Couprie,et al. Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Kristen Grauman,et al. Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[21] Michelle R. Greene. Statistics of high-level scene context , 2013, Front. Psychol..

[22] Matthieu Guillaumin,et al. ImageNet Auto-Annotation with Segmentation Propagation , 2014, International Journal of Computer Vision.

[23] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Jitendra Malik,et al. Simultaneous Detection and Segmentation , 2014, ECCV.

[25] Ejaz Ahmed,et al. Semantic Object Selection , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Allison Sauppé,et al. Robot Deictics: How Gesture and Context Shape Referential Communication , 2014, 2014 9th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[27] Zaïd Harchaoui,et al. On learning to localize objects with minimal supervision , 2014, ICML.

[28] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[30] Bo Han,et al. TouchCut: Fast image and video segmentation using single-touch interaction , 2014, Comput. Vis. Image Underst..

[31] Jia Xu,et al. Tell Me What You See and I Will Show You Where It Is , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[32] B. Scholl,et al. “Please Tap the Shape, Anywhere You Like” , 2014, Psychological science.

[33] Frank Keller,et al. Training Object Class Detectors from Eye Tracking Data , 2014, ECCV.

[34] Chong Wang,et al. Weakly Supervised Object Localization with Latent Category Learning , 2014, ECCV.

[35] Noah Snavely,et al. Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36] Fei-Fei Li,et al. Best of both worlds: Human-machine collaboration for object annotation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Iasonas Kokkinos,et al. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[38] Seunghoon Hong,et al. Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation , 2015, NIPS.

[39] Trevor Darrell,et al. Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40] Seunghoon Hong,et al. Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[41] Ronan Collobert,et al. From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42] George Papandreou,et al. Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43] George Papandreou,et al. Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[44] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[45] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[46] Jian Sun,et al. BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47] Jia Xu,et al. Learning to segment under various forms of weak supervision , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48] Trevor Darrell,et al. Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[49] Jian Sun,et al. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Trevor Darrell,et al. Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.