Real-Time Panoptic Segmentation with Prototype Masks for Automated Driving

In this paper we propose a fast fully convolutional neural network for panoptic segmentation that can provide an accurate semantic and instance-level representation of the environment in the 2D space. We tackle panoptic segmentation as a dense classification problem and generate masks for stuff classes as well as for each instance of things classes. Our network employs a shared backbone and Feature Pyramid Network for multi-scale feature extraction which we extend with dual-decoders that learn background and foreground specific masks. Guided by object proposals, the panoptic head assembles location-sensitive prototype masks using a learned weighting scheme. Our solution runs in real-time, in 82 ms on high resolution images, making it suitable for robotic applications and automated driving. Extensive experiments on the Cityscapes dataset demonstrate that our panoptic segmentation network is robust and accurate, with 57.3% PQ and 76.9% mIoU.

[1]  Arthur Daniel Costea,et al.  Fusion Scheme for Semantic and Instance-level Segmentation , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[2]  George Papandreou,et al.  DeeperLab: Single-Shot Image Parser , 2019, ArXiv.

[3]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[5]  Jie Li,et al.  Learning to Fuse Things and Stuff , 2018, ArXiv.

[6]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Luc Van Gool,et al.  Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Carsten Rother,et al.  InstanceCut: From Edges to Instances with MultiCut , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Bastian Leibe,et al.  Single-Shot Panoptic Segmentation , 2019, 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Jongyoul Park,et al.  CenterMask: Real-Time Anchor-Free Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jie Li,et al.  Real-Time Panoptic Segmentation From Dense Detections , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Ming Yang,et al.  SSAP: Single-Shot Instance Segmentation With Affinity Pyramid , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Gijs Dubbelman,et al.  Fast Panoptic Segmentation Network , 2019, IEEE Robotics and Automation Letters.

[19]  Xinlei Chen,et al.  TensorMask: A Foundation for Dense Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  Kaiming He,et al.  Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Thomas Brox,et al.  Box2Pix: Single-Shot Instance Segmentation by Assigning Pixels to Object Boxes , 2018, 2018 IEEE Intelligent Vehicles Symposium (IV).

[25]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[26]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Thomas S. Huang,et al.  Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Nuno Vasconcelos,et al.  Cascade R-CNN: Delving Into High Quality Object Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Roberto Cipolla,et al.  Multi-task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Yuwen Xiong,et al.  PolyTransform: Deep Polygon Transformer for Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Lorenzo Porzi,et al.  Seamless Scene Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Sergiu Nedevschi,et al.  Multi-task Network for Panoptic Segmentation in Automated Driving , 2019, 2019 IEEE Intelligent Transportation Systems Conference (ITSC).

[33]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Konstantin Sofiiuk,et al.  AdaptIS: Adaptive Instance Selection Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).