From colouring-in to pointillism: revisiting semantic segmentation supervision

The prevailing paradigm for producing semantic segmentation training data relies on densely labelling each pixel of each image in the training set, akin to colouring-in books. This approach becomes a bottleneck when scaling up in the number of images, classes, and annotators. Here we propose instead a pointillist approach for semantic segmentation annotation, where only point-wise yes/no questions are answered. We explore design alternatives for such an active learning approach, measure the speed and consistency of human annotators on this task, show that this strategy enables training good segmentation models, and that it is suitable for evaluating models at test time. As concrete proof of the scalability of our method, we collected and released 22.6M point labels over 4,171 classes on the Open Images dataset. Our results enable to rethink the semantic segmentation pipeline of annotation, training, and evaluation from a pointillism point of view.

[1]  Lingxi Xie,et al.  Active Pointly-Supervised Instance Segmentation , 2022, ECCV.

[2]  Yin Cui,et al.  Scaling Open-Vocabulary Image Segmentation with Image-Level Labels , 2021, ECCV.

[3]  Alexander S. Ecker,et al.  Image Segmentation Using Text and Image Prompts , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Dengxin Dai,et al.  Decoupling Zero-Shot Semantic Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Nir Zabari,et al.  Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples , 2021, ArXiv.

[6]  Tongliang Liu,et al.  CRIS: CLIP-Driven Referring Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Julien Mairal,et al.  Emerging Properties in Self-Supervised Vision Transformers , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[8]  Omkar M. Parkhi,et al.  Pointly-Supervised Instance Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Samuel Albanie,et al.  All you need are a few pixels: semantic segmentation with PixelPick , 2021, 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).

[10]  Ilya Sutskever,et al.  Learning Transferable Visual Models From Natural Language Supervision , 2021, ICML.

[11]  Quoc V. Le,et al.  Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision , 2021, ICML.

[12]  B. S. Manjunath,et al.  PCAMs: Weakly Supervised Semantic Segmentation Using Point Supervision , 2020, ArXiv.

[13]  Jianfeng Gao,et al.  Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks , 2020, ECCV.

[14]  Xilin Chen,et al.  Self-Supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[16]  Jordi Pont-Tuset,et al.  Connecting Vision and Language with Localized Narratives , 2019, ECCV.

[17]  Laurens van der Maaten,et al.  Self-Supervised Learning of Pretext-Invariant Representations , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Kavita Bala,et al.  Block Annotation: Better Image Annotation With Sub-Image Decomposition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Furu Wei,et al.  VL-BERT: Pre-training of Generic Visual-Linguistic Representations , 2019, ICLR.

[20]  Cho-Jui Hsieh,et al.  VisualBERT: A Simple and Performant Baseline for Vision and Language , 2019, ArXiv.

[21]  Stefan Lee,et al.  ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks , 2019, NeurIPS.

[22]  Chi-Keung Tang,et al.  FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Mark W. Schmidt,et al.  Instance Segmentation with Point Supervision , 2019, ArXiv.

[24]  Jordi Pont-Tuset,et al.  Natural Vocabulary Emerges from Free-Form Annotations , 2019, ArXiv.

[25]  Ross B. Girshick,et al.  LVIS: A Dataset for Large Vocabulary Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Michael Gygli,et al.  Efficient Object Annotation via Speaking and Pointing , 2019, International Journal of Computer Vision.

[27]  Suha Kwak,et al.  Weakly Supervised Learning of Instance Segmentation With Inter-Pixel Relations , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Natasha Noy,et al.  Industry-scale Knowledge Graphs: Lessons and Challenges , 2019, ACM Queue.

[29]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[30]  Rodrigo Benenson,et al.  Large-Scale Interactive Object Segmentation With Human Annotators , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Carsten Rother,et al.  CEREALS - Cost-Effective REgion-based Active Learning for Semantic Segmentation , 2018, BMVC.

[32]  Jeff B. Pelz,et al.  SNAG: Spoken Narratives and Gaze Dataset , 2018, ACL.

[33]  Andreas Nürnberger,et al.  The Power of Ensembles for Active Learning in Image Classification , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Frédo Durand,et al.  On the Importance of Label Quality for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[36]  Sanja Fidler,et al.  Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++ , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[38]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Luc Van Gool,et al.  Deep Extreme Cut: From Extreme Points to Object Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Luc Van Gool,et al.  Object Referring in Visual Scene with Spoken Language , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[41]  Frank Keller,et al.  Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.

[43]  Sabine Süsstrunk,et al.  Webly Supervised Semantic Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Lin Yang,et al.  Suggestive Annotation: A Deep Active Learning Framework for Biomedical Image Segmentation , 2017, MICCAI.

[45]  Seong Joon Oh,et al.  Exploiting Saliency for Object Segmentation from Image Level Labels , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Li Fei-Fei,et al.  Crowdsourcing in Computer Vision , 2016, Found. Trends Comput. Graph. Vis..

[49]  Bolei Zhou,et al.  Semantic Understanding of Scenes Through the ADE20K Dataset , 2016, International Journal of Computer Vision.

[50]  Jordi Pont-Tuset,et al.  Supervised Evaluation of Image Segmentation and Object Proposal Techniques , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Kristen Grauman,et al.  Active Image Segmentation Propagation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Weifeng Chen,et al.  Single-Image Depth Perception in the Wild , 2016, NIPS.

[54]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[57]  Ning Xu,et al.  Deep Interactive Object Selection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59]  Carsten Rother,et al.  DenseCut: Densely Connected CRFs for Realtime GrabCut , 2015, Comput. Graph. Forum.

[60]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[61]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[62]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[64]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[66]  Michael S. Bernstein,et al.  Scalable multi-label annotation , 2014, CHI.

[67]  Noah Snavely,et al.  OpenSurfaces , 2013, ACM Trans. Graph..

[68]  Cristian Sminchisescu,et al.  Composite Statistical Inference for Semantic Segmentation , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[69]  Joachim M. Buhmann,et al.  Active learning for semantic segmentation with expected change , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Didier Stricker,et al.  CoVidA: pen-based collaborative video annotation , 2012, VIGTA '12.

[71]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[72]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[73]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[74]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[75]  Jiebo Luo,et al.  iCoseg: Interactive co-segmentation with intelligent scribble guidance , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[76]  Nikolaos Papanikolopoulos,et al.  Multi-class active learning for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[78]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[79]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[80]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[81]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[82]  David A. Cohn,et al.  Improving generalization with active learning , 1994, Machine Learning.

[83]  I. Biederman Recognition-by-components: a theory of human image understanding. , 1987, Psychological review.

[84]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[85]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[86]  John R. Smith,et al.  How Many Visual Concepts? , 2014, IEEE Multim..

[87]  Gabriela Csurka,et al.  What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[88]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[89]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.