Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation

Identifying unexpected objects on roads in semantic segmentation (e.g., identifying dogs on roads) is crucial in safety-critical applications. Existing approaches use images of unexpected objects from external datasets or require additional training (e.g., retraining segmentation networks or training an extra network), which necessitate a non-trivial amount of labor intensity or lengthy inference time. One possible alternative is to use prediction scores of a pre-trained network such as the max logits (i.e., maximum values among classes before the final softmax layer) for detecting such objects. However, the distribution of max logits of each predicted class is significantly different from each other, which degrades the performance of identifying unexpected objects in urban-scene segmentation. To address this issue, we propose a simple yet effective approach that standardizes the max logits in order to align the different distributions and reflect the relative meanings of max logits within each predicted class. Moreover, we consider the local regions from two different perspectives based on the intuition that neighboring pixels share similar semantic information. In contrast to previous approaches, our method does not utilize any external datasets or require additional training, which makes our method widely applicable to ex* indicates equal contribution isting pre-trained segmentation models. Such a straightforward approach achieves a new state-of-the-art performance on the publicly available Fishyscapes Lost & Found leaderboard with a large margin. Our code is publicly available at this link1.

[1]  Sebastian Ramos,et al.  Lost and Found: detecting small road hazards for self-driving vehicles , 2016, 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[2]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[3]  Hujun Bao,et al.  Active Boundary Loss for Semantic Segmentation , 2021, ArXiv.

[4]  Youn-Long Lin,et al.  HarDNet: A Low Memory Traffic Network , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Jishnu Mukhoti,et al.  Evaluating Bayesian Deep Learning Methods for Semantic Segmentation , 2018, ArXiv.

[6]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kun Yu,et al.  DenseASPP for Semantic Segmentation in Street Scenes , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Pascal Fua,et al.  Detecting the Unexpected via Image Resynthesis , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[11]  Pascal Fua,et al.  Detecting Road Obstacles by Erasing Them , 2020, ArXiv.

[12]  Seungryong Kim,et al.  RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Rohit Mohan,et al.  EfficientPS: Efficient Panoptic Segmentation , 2020, International Journal of Computer Vision.

[14]  Dawn Song,et al.  Pretrained Transformers Improve Out-of-Distribution Robustness , 2020, ACL.

[15]  M. Rottmann,et al.  Entropy Maximization and Meta Classification for Out-of-Distribution Detection in Semantic Segmentation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[16]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[17]  Dong Nie,et al.  Bidirectional Pyramid Networks for Semantic Segmentation , 2020, ACCV.

[18]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[19]  Pengfei Xiong,et al.  Pyramid Attention Network for Semantic Segmentation , 2018, BMVC.

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Petra Bevandi'c,et al.  Dense outlier detection and open-set recognition based on training with noisy negative images , 2021 .

[22]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[24]  R. Srikant,et al.  Enhancing The Reliability of Out-of-distribution Image Detection in Neural Networks , 2017, ICLR.

[25]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[26]  Jaegul Choo,et al.  Cars Can’t Fly Up in the Sky: Improving Urban-Scene Segmentation via Height-Driven Attention Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xiangyu Zhang,et al.  ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design , 2018, ECCV.

[28]  Xin Li,et al.  FoveaNet: Perspective-Aware Urban Scene Parsing , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[29]  Lorenzo Porzi,et al.  In-place Activated BatchNorm for Memory-Optimized Training of DNNs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Jianbo Shi,et al.  Semantic Segmentation with Boundary Neural Fields , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[32]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[33]  Marin Orsic,et al.  Efficient semantic segmentation with pyramidal fusion , 2021, Pattern Recognit..

[34]  K. Horiguchi,et al.  Road Obstacle Detection Method Based on an Autoencoder with Semantic Segmentation , 2020, ACCV.

[35]  E. S. Gedraite,et al.  Investigation on the effect of a Gaussian Blur in image filtering and segmentation , 2011, Proceedings ELMAR-2011.

[36]  Chongruo Wu,et al.  ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Petra Bevandic,et al.  Dense open-set recognition with synthetic outliers generated by Real NVP , 2020, VISIGRAPP.

[38]  Shawn D. Newsam,et al.  Improving Semantic Segmentation via Video Propagation and Label Relaxation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Roland Siegwart,et al.  The Fishyscapes Benchmark: Measuring Blind Spots in Semantic Segmentation , 2019, International Journal of Computer Vision.

[40]  Xiang Bai,et al.  Asymmetric Non-Local Neural Networks for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[41]  Dawn Song,et al.  Scaling Out-of-Distribution Detection for Real-World Settings. , 2020 .

[42]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[43]  Yingda Xia,et al.  Synthesize then Compare: Detecting Failures and Anomalies for Semantic Segmentation , 2020, ECCV.

[44]  Mark J. F. Gales,et al.  Predictive Uncertainty Estimation via Prior Networks , 2018, NeurIPS.