A Unified Neural Network for Panoptic Segmentation

In this paper, we propose a unified neural network for panoptic segmentation, a task aiming to achieve more fine‐grained segmentation. Following existing methods combining semantic and instance segmentation, our method relies on a triple‐branch neural network for tackling the unifying work. In the first stage, we adopt a ResNet50 with a feature pyramid network (FPN) as shared backbone to extract features. Then each branch leverages the shared feature maps and serves as the stuff, things, or mask branch. Lastly, the outputs are fused following a well‐designed strategy. Extensive experimental results on MS‐COCO dataset demonstrate that our approach achieves a competitive Panoptic Quality (PQ) metric score with the state of the art.

[1]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[4]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[5]  Guan Huang,et al.  Attention-Guided Unified Network for Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[7]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[9]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jie Li,et al.  Learning to Fuse Things and Stuff , 2018, ArXiv.

[13]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[14]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Wei Sun,et al.  Methods and datasets on semantic segmentation: A review , 2018, Neurocomputing.

[16]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Richard Socher,et al.  Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.

[18]  Gijs Dubbelman,et al.  Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network , 2018, ArXiv.

[19]  Xu Liu,et al.  An End-To-End Network for Panoptic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).