Weakly- and Semi-Supervised Panoptic Segmentation

We present a weakly supervised model that jointly performs both semantic- and instance-segmentation – a particularly relevant problem given the substantial cost of obtaining pixel-perfect annotation for these tasks. In contrast to many popular instance segmentation approaches based on object detectors, our method does not predict any overlapping instances. Moreover, we are able to segment both “thing” and “stuff” classes, and thus explain all the pixels in the image. “Thing” classes are weakly-supervised with bounding boxes, and “stuff” with image-level tags. We obtain state-of-the-art results on Pascal VOC, for both full and weak supervision (which achieves about 95% of fully-supervised performance). Furthermore, we present the first weakly-supervised results on Cityscapes for both semantic- and instance-segmentation. Finally, we use our weakly supervised framework to analyse the relationship between annotation quality and predictive performance, which is of interest to dataset creators.

[1]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[4]  Ming-Hsuan Yang,et al.  Multi-instance object segmentation with occlusion handling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Sergio Guadarrama,et al.  Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Subhransu Maji,et al.  Semantic contours from inverse detectors , 2011, 2011 International Conference on Computer Vision.

[8]  Richard S. Zemel,et al.  End-to-End Instance Segmentation with Recurrent Attention , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  George Papandreou,et al.  Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation , 2015, ArXiv.

[10]  Hong Zhang,et al.  Multi-scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Frank Keller,et al.  Training Object Class Detectors from Eye Tracking Data , 2014, ECCV.

[13]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[14]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[15]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[17]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Edward H. Adelson,et al.  On seeing stuff: the perception of materials by humans and machines , 2001, IS&T/SPIE Electronic Imaging.

[19]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Yao Zhao,et al.  Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Chen Sun,et al.  Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23]  Li Xu,et al.  Hierarchical Image Saliency Detection on Extended CSSD , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[28]  Zhe L. Lin,et al.  Top-Down Neural Attention by Excitation Backprop , 2016, International Journal of Computer Vision.

[29]  Frank Keller,et al.  Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[31]  Trevor Darrell,et al.  Constrained Convolutional Neural Networks for Weakly Supervised Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[32]  Seong Joon Oh,et al.  Exploiting Saliency for Object Segmentation from Image Level Labels , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[34]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Jian Sun,et al.  BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Sanja Fidler,et al.  SGN: Sequential Grouping Networks for Instance Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Philip H. S. Torr,et al.  Discovering Class-Specific Pixels for Weakly-Supervised Semantic Segmentation , 2017, BMVC.

[42]  Yi Yang,et al.  Layered Object Models for Image Segmentation , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[44]  Christoph H. Lampert,et al.  Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation , 2016, ECCV.

[45]  Luc Van Gool,et al.  Semantic Instance Segmentation with a Discriminative Loss Function , 2017, ArXiv.

[46]  Luc Van Gool,et al.  Semantic Instance Segmentation for Autonomous Driving , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[47]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[48]  Philip H. S. Torr,et al.  Higher Order Conditional Random Fields in Deep Neural Networks , 2015, ECCV.

[49]  Ronan Collobert,et al.  From image-level to pixel-level labeling with Convolutional Networks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Vibhav Vineet,et al.  Conditional Random Fields as Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[51]  Philip H. S. Torr,et al.  Pixelwise Instance Segmentation with a Dynamically Instantiated Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[54]  Philip H. S. Torr,et al.  Holistic, Instance-Level Human Parsing , 2017, BMVC.

[55]  Hayit Greenspan,et al.  Finding Pictures of Objects in Large Collections of Images , 1996, Object Representation in Computer Vision.

[56]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[57]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[58]  Thomas Brox,et al.  Pixel-Level Encoding and Depth Layering for Instance-Level Semantic Labeling , 2016, GCPR.

[59]  Philip H. S. Torr,et al.  Bottom-up Instance Segmentation using Deep Higher-Order CRFs , 2016, BMVC.

[60]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[61]  Carsten Rother,et al.  Conditional Random Fields Meet Deep Neural Networks for Semantic Segmentation: Combining Probabilistic Graphical Models with Deep Learning for Structured Prediction , 2018, IEEE Signal Processing Magazine.

[62]  Yunchao Wei,et al.  Proposal-Free Network for Instance-Level Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[64]  Bill Triggs,et al.  Scene Segmentation with CRFs Learned from Partially Labeled Images , 2007, NIPS.

[65]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[68]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[70]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[71]  Xinlei Chen,et al.  PixelNet: Representation of the pixels, by the pixels, and for the pixels , 2017, ArXiv.

[72]  Sanja Fidler,et al.  Monocular Object Instance Segmentation and Depth Ordering with CNNs , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).