Panoramic Panoptic Segmentation: Insights Into Surrounding Parsing for Mobile Agents via Unsupervised Contrastive Learning

In this work, we introduce panoramic panoptic segmentation, as the most holistic scene understanding, both in terms of Field of View (FoV) and image-level understanding for standard camera-based input. A complete surrounding understanding provides a maximum of information to a mobile agent. This is essential information for any intelligent vehicle to make informed decisions in a safety-critical dynamic environment such as real-world traffic. In order to overcome the lack of annotated panoramic images, we propose a framework which allows model training on standard pinhole images and transfers the learned features to the panoramic domain in a cost-minimizing way. The domain shift from pinhole to panoramic images is non-trivial as large objects and surfaces are heavily distorted close to the image border regions and look different across the two domains. Using our proposed method with dense contrastive learning, we manage to achieve significant improvements over a non-adapted approach. Depending on the efficient panoptic segmentation architecture, we can improve 3.5–6.5% measured in Panoptic Quality (PQ) over non-adapted models on our established Wild Panoramic Panoptic Segmentation (WildPPS) dataset. Furthermore, our efficient framework does not need access to the images of the target domain, making it a feasible domain generalization approach suitable for a limited hardware setting. As additional contributions, we publish WildPPS: The first panoramic panoptic image dataset to foster progress in surrounding perception and explore a novel training procedure combining supervised and contrastive training.

[1]  S. Nedevschi,et al.  Semantic Cameras for 360-Degree Environment Perception in Automated Urban Driving , 2022, IEEE Transactions on Intelligent Transportation Systems.

[2]  H. Bao,et al.  PVO: Panoptic Visual Odometry , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Dragomir Anguelov,et al.  Waymo Open Dataset: Panoramic Video Panoptic Segmentation , 2022, ECCV.

[4]  Maxwell D. Collins,et al.  CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  J. Bai,et al.  Review on Panoramic Imaging and Its Applications in Scene Understanding , 2022, IEEE Transactions on Instrumentation and Measurement.

[6]  M. Giordani,et al.  SELMA: SEmantic Large-Scale Multimodal Acquisitions in Variable Weather, Daytime and Viewpoints , 2022, IEEE Transactions on Intelligent Transportation Systems.

[7]  F. Porikli,et al.  Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  R. Stiefelhagen,et al.  Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  R. Stiefelhagen,et al.  Bending Reality: Distortion-aware Transformers for Adapting to Panoramic Semantic Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  R. Stiefelhagen,et al.  TransKD: Transformer Knowledge Distillation for Efficient Semantic Segmentation , 2022, IEEE Transactions on Intelligent Transportation Systems.

[11]  Cheke Ramchandra,et al.  FisheyePixPro: Self-supervised pretraining using Fisheye images for semantic segmentation , 2022, Electronic Imaging.

[12]  Rainer Stiefelhagen,et al.  Exploring Event-Driven Dynamic Context for Accident Scene Segmentation , 2021, IEEE Transactions on Intelligent Transportation Systems.

[13]  A. Schwing,et al.  Masked-attention Mask Transformer for Universal Image Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Rainer Stiefelhagen,et al.  Transfer Beyond the Field of View: Dense Panoramic Semantic Segmentation via Unsupervised Domain Adaptation , 2021, IEEE Transactions on Intelligent Transportation Systems.

[15]  J. Cui,et al.  Region-aware Contrastive Learning for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Andreas Geiger,et al.  KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Weiping Wang,et al.  Dense Semantic Contrast for Self-Supervised Visual Representation Learning , 2021, ACM Multimedia.

[18]  Anima Anandkumar,et al.  Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Rainer Stiefelhagen,et al.  Trans4Trans: Efficient Transformer for Transparent Object and Semantic Scene Segmentation in Real-World Navigation Assistance , 2021, IEEE Transactions on Intelligent Transportation Systems.

[20]  Wenjun Zeng,et al.  Self-Supervised Visual Representations Learning by Contrastive Mask Prediction , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Yalin Bastanlar,et al.  Semantic segmentation of outdoor panoramic images , 2021, Signal, Image and Video Processing.

[22]  N. Gosala,et al.  Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images , 2021, IEEE Robotics and Automation Letters.

[23]  Alexander G. Schwing,et al.  Per-Pixel Classification is Not All You Need for Semantic Segmentation , 2021, NeurIPS.

[24]  Rainer Stiefelhagen,et al.  MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense Top-View Understanding , 2021, IEEE Transactions on Intelligent Transportation Systems.

[25]  Kai Chen,et al.  K-Net: Towards Unified Image Segmentation , 2021, NeurIPS.

[26]  Chen Change Loy,et al.  Unsupervised Object-Level Representation Learning from Scene Images , 2021, NeurIPS.

[27]  Mingqiang Wei,et al.  Nlkd: Using Coarse Annotations For Semantic Segmentation Based on Knowledge Distillation , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Adrien Gaidon,et al.  Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[30]  Xiaoyang Tan,et al.  Cross-scene foreground segmentation with supervised and unsupervised model communication , 2021, Pattern Recognit..

[31]  Edward Johns,et al.  Bootstrapping Semantic Segmentation with Regional Contrast , 2021, ICLR.

[32]  S. Yogamani,et al.  Near-Field Perception for Low-Speed Vehicle Automation Using Surround-View Fisheye Cameras , 2021, IEEE Transactions on Intelligent Transportation Systems.

[33]  Kurt Keutzer,et al.  Region Similarity Representation Learning , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[34]  Rainer Stiefelhagen,et al.  Capturing Omni-Range Context for Omnidirectional Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Rainer Stiefelhagen,et al.  Panoramic Panoptic Segmentation: Towards Complete Surrounding Understanding via Unsupervised Contrastive Learning , 2021, 2021 IEEE Intelligent Vehicles Symposium (IV).

[36]  S. Yogamani,et al.  OmniDet: Surround View Cameras Based Multi-Task Visual Perception Network for Autonomous Driving , 2021, IEEE Robotics and Automation Letters.

[37]  Chunhua Shen,et al.  Instance and Panoptic Segmentation Using Conditional Convolutions , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  L. Gool,et al.  Exploring Cross-Image Pixel Contrast for Semantic Segmentation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[39]  Rainer Stiefelhagen,et al.  Is Context-Aware CNN Ready for the Surroundings? Panoramic Semantic Segmentation in the Wild , 2021, IEEE Transactions on Image Processing.

[40]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Seoung Wug Oh,et al.  Single-shot Path Integrated Panoptic Segmentation , 2020, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[42]  Xiaojuan Qi,et al.  Fully Convolutional Networks for Panoptic Segmentation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  A. Yuille,et al.  MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Stephen Lin,et al.  Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Tao Kong,et al.  Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Kailun Yang,et al.  PASS: Panoramic Annular Semantic Segmentation , 2020, IEEE Transactions on Intelligent Transportation Systems.

[47]  Rainer Stiefelhagen,et al.  Omnisupervised Omnidirectional Semantic Segmentation , 2020, IEEE Transactions on Intelligent Transportation Systems.

[48]  Stewart Worrall,et al.  Camera-LIDAR Integration: Probabilistic Sensor Fusion for Semantic Mapping , 2020, IEEE Transactions on Intelligent Transportation Systems.

[49]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[50]  Nicolas Usunier,et al.  End-to-End Object Detection with Transformers , 2020, ECCV.

[51]  Yohan Dupuis,et al.  The OmniScape Dataset , 2020, 2020 IEEE International Conference on Robotics and Automation (ICRA).

[52]  A. Yuille,et al.  Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation , 2020, ECCV.

[53]  Hao Chen,et al.  Conditional Convolutions for Instance Segmentation , 2020, ECCV.

[54]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[55]  Xiaojuan Qi,et al.  Unifying Training and Inference for Panoptic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  H. Bao,et al.  Deep Snake for Real-Time Instance Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Yuning Jiang,et al.  SOLO: Segmenting Objects by Locations , 2019, ECCV.

[58]  Maxwell D. Collins,et al.  Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Ping Luo,et al.  PolarMask: Single Shot Instance Segmentation With Polar Representation , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  R. Stiefelhagen,et al.  DS-PASS: Detail-Sensitive Panoramic Annular Semantic Segmentation through SwaftNet for Surrounding Sensing , 2019, 2020 IEEE Intelligent Vehicles Symposium (IV).

[62]  Roberto Cipolla,et al.  Orientation-Aware Semantic Segmentation on Icosahedron Spheres , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[63]  Sanja Fidler,et al.  Gated-SCNN: Gated Shape CNNs for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Stefan Milz,et al.  WoodScape: A Multi-Task, Multi-Camera Fisheye Dataset for Autonomous Driving , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[65]  Lorenzo Porzi,et al.  Seamless Scene Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Yong Jae Lee,et al.  YOLACT: Real-Time Instance Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[67]  Siniša Šegvić,et al.  In Defense of Pre-Trained ImageNet Architectures for Real-Time Semantic Segmentation of Road-Driving Images , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Xu Liu,et al.  An End-To-End Network for Panoptic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Min Bai,et al.  UPSNet: A Unified Panoptic Segmentation Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Kaiming He,et al.  Panoptic Feature Pyramid Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Guan Huang,et al.  Attention-Guided Unified Network for Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Sheng Tang,et al.  CGNet: A Light-Weight Context Guided Network for Semantic Segmentation , 2018, IEEE Transactions on Image Processing.

[73]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[74]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[75]  Stella X. Yu,et al.  Unsupervised Feature Learning via Non-parametric Instance Discrimination , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[76]  Shu Liu,et al.  Path Aggregation Network for Instance Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[77]  Carsten Rother,et al.  Panoptic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Ming Yang,et al.  Restricted Deformable Convolution-Based Road Scene Semantic Segmentation Using Surround View Cameras , 2018, IEEE Transactions on Intelligent Transportation Systems.

[79]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[80]  Yang You,et al.  Large Batch Training of Convolutional Networks , 2017, 1708.03888.

[81]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[82]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[83]  Philip H. S. Torr,et al.  Pixelwise Instance Segmentation with a Dynamically Instantiated Network , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[85]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[86]  Min Bai,et al.  Deep Watershed Transform for Instance Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[87]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[88]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[89]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[90]  Yi Li,et al.  Instance-Sensitive Fully Convolutional Networks , 2016, ECCV.

[91]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[92]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[94]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[95]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[96]  Huosheng Hu,et al.  Distortion Convolution Module for Semantic Segmentation of Panoramic Images Based on the Image-Forming Principle , 2022, IEEE Transactions on Instrumentation and Measurement.

[97]  R. Stiefelhagen,et al.  Transformer-based Knowledge Distillation for Efficient Semantic Segmentation of Road-driving Scenes , 2022, ArXiv.

[98]  Stephan R. Richter,et al.  Looking Beyond Single Images for Contrastive Semantic Segmentation Learning , 2021, NeurIPS.

[99]  Eduardo Romera,et al.  ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation , 2018, IEEE Transactions on Intelligent Transportation Systems.

[100]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .