7th AI Driving Olympics: 1st Place Report for Panoptic Tracking

In this technical report, we describe our EfficientLPT architecture that won the panoptic tracking challenge in the 7th AI Driving Olympics at NeurIPS 2021. Our architecture builds upon the top-down EfficientLPS panoptic segmentation approach. EfficientLPT consists of a shared backbone with a modified EfficientNet-B5 model comprising the proximity convolution module as the encoder followed by the range-aware FPN to aggregate semantically rich range-aware multi-scale features. Subsequently, we employ two task-specific heads, the scale-invariant semantic head and hybrid task cascade with feedback from the semantic head as the instance head. Further, we employ a novel panoptic fusion module to adaptively fuse logits from each of the heads to yield the panoptic tracking output. Our approach exploits three consecutive accumulated scans to predict locally consistent panoptic tracking IDs and also the overlap between the scans to predict globally consistent panoptic tracking IDs for a given sequence. The benchmarking results from the 7th AI Driving Olympics at NeurIPS 2021 show that our model is ranked #1 for the panoptic tracking task on the Panoptic nuScenes dataset.

[1]  Abhinav Valada,et al.  There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Cyrill Stachniss,et al.  RangeNet ++: Fast and Accurate LiDAR Semantic Segmentation , 2019, 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[3]  Wolfram Burgard,et al.  Multimodal interaction-aware motion prediction for autonomous street crossing , 2018, Int. J. Robotics Res..

[4]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[5]  Rohit Mohan,et al.  EfficientPS: Efficient Panoptic Segmentation , 2020, International Journal of Computer Vision.

[6]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[7]  George Papandreou,et al.  Searching for Efficient Multi-Scale Architectures for Dense Image Prediction , 2018, NeurIPS.

[8]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[9]  Kai Chen,et al.  Hybrid Task Cascade for Instance Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Abhinav Valada,et al.  Unsupervised Domain Adaptation for LiDAR Panoptic Segmentation , 2021, ArXiv.

[11]  Juana Valeria Hurtado,et al.  MOPT: Multi-Object Panoptic Tracking , 2020, ArXiv.

[12]  Daniel Honerkamp,et al.  Catch Me If You Hear Me: Audio-Visual Navigation in Complex Unmapped Environments with Moving Sounds , 2021 .

[13]  Abhinav Valada,et al.  Dynamic Object Removal and Spatio-Temporal RGB-D Inpainting via Geometry-Aware Adversarial Learning , 2020, IEEE Transactions on Intelligent Vehicles.

[14]  Wolfram Burgard,et al.  EfficientLPS: Efficient LiDAR Panoptic Segmentation , 2021, IEEE Transactions on Robotics.

[15]  Rohit Mohan,et al.  Panoptic nuScenes: A Large-Scale Benchmark for LiDAR Panoptic Segmentation and Tracking , 2021, IEEE Robotics and Automation Letters.

[16]  Wolfram Burgard,et al.  Self-Supervised Visual Terrain Classification From Unsupervised Acoustic Feature Learning , 2019, IEEE Transactions on Robotics.

[17]  N. Gosala,et al.  Bird's-Eye-View Panoptic Segmentation Using Monocular Frontal View Images , 2021, IEEE Robotics and Automation Letters.