How to Save your Annotation Cost for Panoptic Segmentation?

How to properly reduce the annotation cost for panoptic segmentation? How to leverage and optimize the cost-quality trade-off for training data and model? These questions are key challenges towards a label-efficient and scalable panoptic segmentation system due to its expensive instance/semantic pixel-level annotation requirements. By closely examining different kinds of cheaper labels, we introduce a novel multiobjective framework to automatically determine the allocation of different annotations, so as to reach a better segmentation quality with a lower annotation cost. Specifically, we design a Cost-Quality Balanced Network (CQB-Net) to generate the panoptic segmentation map, which distills the crucial relations between various supervisions including panoptic labels, image-level classification labels, bounding boxes, and the semantic coherence information between the foreground and background. Instead of ad-hoc allocation during training, we formulate the optimization of cost-quality trade-off as a Multi-Objective Optimization Problem (MOOP). We model the marginal quality improvement of each annotation and approximate the Pareto-front to enable a label-efficient allocation ratio. Extensive experiments on COCO benchmark show the superiority of our method, e.g. achieving a segmentation quality of 43.4% compared to 43.0% of OCFusion while saving 2.4x annotation cost. Introduction Panoptic segmentation unifies foreground instance segmentation (named thing) and semantic segmentation on amorphous background regions (named stuff ) to generate rich and coherent masks. It poses a challenging problem on holistic image understanding of all foreground objects and background contents simultaneously. State-of-the-art methods (Yang et al. 2019b) are trained in a fully-supervised fashion, for which per-pixel background class and foreground instance labels are required. Due to the data-hungry nature of deep networks, these approaches demand an enormous number of training images with curated groundtruth labels, which are given by hand in general. However, manual annotation of such labels is extremely labor-intensive and time-consuming, e.g. taking around 19 minutes for one single image in COCO to be fully annotated (Caesar, Uijlings, ∗Corresponding Author Copyright c © 2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. O1 O2 O3 O4 O5 O6 R1 R2 R3 R4 R5 R6 R7

[1]  Trevor Darrell,et al.  Learning to Segment Every Thing , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Mark W. Schmidt,et al.  Where are the Masks: Instance Segmentation with Image-level Supervision , 2019, BMVC.

[3]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Huimin Ma,et al.  Weakly-Supervised Semantic Segmentation by Iteratively Mining Common Object Features , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Frank Keller,et al.  Extreme Clicking for Efficient Object Annotation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[8]  Kaiming He,et al.  Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Qi Tian,et al.  Zigzag Learning for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[11]  Ming-Hsuan Yang,et al.  Adversarial Learning for Semi-supervised Semantic Segmentation , 2018, BMVC.

[12]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Guan Huang,et al.  Attention-Guided Unified Network for Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[15]  Bernard Ghanem,et al.  W2F: A Weakly-Supervised to Fully-Supervised Framework for Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Yung-Yu Chuang,et al.  Weakly Supervised Instance Segmentation using the Bounding Box Tightness Prior , 2019, NeurIPS.

[17]  Jie Li,et al.  Real-Time Panoptic Segmentation From Dense Detections , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Yunchao Wei,et al.  Revisiting Dilated Convolution: A Simple Approach for Weakly- and Semi-Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  George Papandreou,et al.  DeeperLab: Single-Shot Image Parser , 2019, ArXiv.

[21]  Bernt Schiele,et al.  Simple Does It: Weakly Supervised Instance and Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  George Trimponias,et al.  Multi-objective Neural Architecture Search via Non-stationary Policy Gradient , 2020, ArXiv.

[23]  Trevor Darrell,et al.  Fully Convolutional Multi-Class Multiple Instance Learning , 2014, ICLR.

[24]  Pushmeet Kohli,et al.  Efficient Relaxations for Dense CRFs with Sparse Higher Order Potentials , 2019, SIAM J. Imaging Sci..

[25]  Jie Li,et al.  Learning to Fuse Things and Stuff , 2018, ArXiv.

[26]  Changshui Zhang,et al.  Weakly- and Semi-Supervised Object Detection with Expectation-Maximization Algorithm , 2017, ArXiv.

[27]  Qixiang Ye,et al.  Min-Entropy Latent Model for Weakly Supervised Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Ke Gong,et al.  Bidirectional Graph Reasoning Network for Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Philip H. S. Torr,et al.  Weakly- and Semi-Supervised Panoptic Segmentation , 2022 .

[30]  Wenyu Liu,et al.  Multiple Instance Detection Network with Online Instance Classifier Refinement , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Zhili Liu,et al.  EHSOD: CAM-Guided End-to-end Hybrid-Supervised Object Detection with Cascade Refinement , 2020, AAAI.

[32]  Rongrong Ji,et al.  Generative Adversarial Learning Towards Fast Weakly Supervised Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yong Jae Lee,et al.  Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  George Papandreou,et al.  Weakly-and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Suha Kwak,et al.  Learning Pixel-Level Semantic Affinity with Image-Level Supervision for Weakly Supervised Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Mingjie Sun,et al.  Reliability Does Matter: An End-to-End Weakly Supervised Semantic Segmentation Approach , 2019, AAAI.

[38]  Jitendra Malik,et al.  ShapeMask: Learning to Segment Novel Objects by Refining Shape Priors , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Ann Nowé,et al.  Multi-objective reinforcement learning using sets of pareto dominating policies , 2014, J. Mach. Learn. Res..

[40]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Wenyu Liu,et al.  PCL: Proposal Cluster Learning for Weakly Supervised Object Detection , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Lorenzo Porzi,et al.  Seamless Scene Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  C. Hudelot,et al.  Semi-Supervised Semantic Segmentation With Cross-Consistency Training , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[46]  Xu Liu,et al.  An End-To-End Network for Panoptic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Bernard Ghanem,et al.  Beyond Weakly Supervised: Pseudo Ground Truths Mining for Missing Bounding-Boxes Object Detection , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[48]  Xia Li,et al.  SOGNet: Scene Overlap Graph Network for Panoptic Segmentation , 2019, AAAI.

[49]  Weilin Huang,et al.  Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[50]  Vittorio Ferrari,et al.  COCO-Stuff: Thing and Stuff Classes in Context , 2016, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.