论文信息 - Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions

Unifying Instance and Panoptic Segmentation with Dynamic Rank-1 Convolutions

Recently, fully-convolutional one-stage networks have shown superior performance comparing to two-stage frameworks for instance segmentation as typically they can generate higher-quality mask predictions with less computation. In addition, their simple design opens up new opportunities for joint multi-task learning. In this paper, we demonstrate that adding a single classification layer for semantic segmentation, fully-convolutional instance segmentation networks can achieve state-of-the-art panoptic segmentation quality. This is made possible by our novel dynamic rank-1 convolution (DR1Conv), a novel dynamic module that can efficiently merge high-level context information with low-level detailed features which is beneficial for both semantic and instance segmentation. Importantly, the proposed new method, termed DR1Mask, can perform panoptic segmentation by adding a single layer. To our knowledge, DR1Mask is the first panoptic segmentation framework that exploits a shared feature map for both instance and semantic segmentation by considering both efficacy and efficiency. Not only our framework is much more efficient -- twice as fast as previous best two-branch approaches, but also the unified framework opens up opportunities for using the same context module to improve the performance for both tasks. As a byproduct, when performing instance segmentation alone, DR1Mask is 10% faster and 1 point in mAP more accurate than previous state-of-the-art instance segmentation network BlendMask. Code is available at: this https URL

[1] George Papandreou,et al. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[2] Kevin Duh,et al. DyNet: The Dynamic Neural Network Toolkit , 2017, ArXiv.

[3] Xia Li,et al. SOGNet: Scene Overlap Graph Network for Panoptic Segmentation , 2019, AAAI.

[4] Hong Liu,et al. Spatial Pyramid Based Graph Reasoning for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Dustin Tran,et al. BatchEnsemble: An Alternative Approach to Efficient Ensemble and Lifelong Learning , 2020, ICLR.

[6] Yukun Zhu,et al. Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[7] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] Han Fang,et al. Linformer: Self-Attention with Linear Complexity , 2020, ArXiv.

[9] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10] Jiaying Liu,et al. Adaptive Batch Normalization for practical domain adaptation , 2018, Pattern Recognit..

[11] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Xiangyu Zhang,et al. Learning Dynamic Routing for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Hao Chen,et al. BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15] Qun Liu,et al. DynaBERT: Dynamic BERT with Adaptive Width and Depth , 2020, NeurIPS.

[16] Aaron C. Courville,et al. FiLM: Visual Reasoning with a General Conditioning Layer , 2017, AAAI.

[17] Lukasz Kaiser,et al. Rethinking Attention with Performers , 2020, ArXiv.

[18] Carsten Rother,et al. Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[20] Jasper Snoek,et al. Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors , 2020, ICML.

[21] Abhinav Gupta,et al. Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22] Yi Zhang,et al. PSANet: Point-wise Spatial Attention Network for Scene Parsing , 2018, ECCV.

[23] Min Bai,et al. UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Thomas S. Huang,et al. Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Ali Farhadi,et al. Supermasks in Superposition , 2020, NeurIPS.

[26] Hao Chen,et al. Conditional Convolutions for Instance Segmentation , 2020, ECCV.

[27] Yong Jae Lee,et al. YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28] Xiaogang Wang,et al. Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.