Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training

Semantic segmentation (SS) is an important perception manner for self-driving cars and robotics, which classifies each pixel into a pre-determined class. The widely-used cross entropy (CE) loss-based deep networks has achieved significant progress w.r.t. the mean Intersection-over Union (mIoU). However, the cross entropy loss can not take the different importance of each class in an self-driving system into account. For example, pedestrians in the image should be much more important than the surrounding buildings when make a decisions in the driving, so their segmentation results are expected to be as accurate as possible. In this paper, we propose to incorporate the importance-aware inter-class correlation in a Wasserstein training framework by configuring its ground distance matrix. The ground distance matrix can be pre-defined following a priori in a specific task, and the previous importance-ignored methods can be the particular cases. From an optimization perspective, we also extend our ground metric to a linear, convex or concave increasing function w.r.t. pre-defined ground distance. We evaluate our method on CamVid and Cityscapes datasets with different backbones (SegNet, ENet, FCN and Deeplab) in a plug and play fashion. In our extenssive experiments, Wasserstein loss demonstrates superior segmentation performance on the predefined critical classes for safe-driving.

[1]  Chao Yang,et al.  A joint optimization framework of low-dimensional projection and collaborative representation for discriminative classification , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[2]  Tong Che,et al.  Conservative Wasserstein Training for Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[3]  Ping Jia,et al.  Line-scan system for continuous hand authentication , 2017 .

[4]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yang Zou,et al.  Data Augmentation via Latent Space Interpolation for Image Classification , 2018, 2018 24th International Conference on Pattern Recognition (ICPR).

[6]  Yang Zou,et al.  Sliced Wasserstein Kernels for Probability Distributions , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Ludger Riischendorf The Wasserstein distance and approximation theorems , 1985 .

[8]  Daniel Ruiz,et al.  A Fast Algorithm for Matrix Balancing , 2013, Web Information Retrieval and Linear Algebra Algorithms.

[9]  Chao Yang,et al.  Normalized face image generation with perceptron generative adversarial networks , 2018, 2018 IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA).

[10]  Jane You,et al.  Feature-Level Frankenstein: Eliminating Variations for Discriminative Recognition , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  C. Villani Topics in Optimal Transportation , 2003 .

[12]  Guosheng Lin,et al.  CRF Learning with CNN Features for Image Segmentation , 2015, Pattern Recognit..

[13]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[15]  Jane You,et al.  Reinforced Wasserstein Training for Severity-Aware Semantic Segmentation in Autonomous Driving , 2020, ArXiv.

[16]  Gang Hua,et al.  Order-Preserving Wasserstein Distance for Sequence Matching , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Chao Yang,et al.  Image Inpainting using Block-wise Procedural Training with Annealed Adversarial Counterpart , 2018, ArXiv.

[19]  Xiaofeng Liu,et al.  Deep Verifier Networks: Verification of Deep Discriminative Models with Deep Generative Models , 2019, AAAI.

[20]  Jian Yang,et al.  Importance-Aware Semantic Segmentation for Autonomous Driving System , 2017, IJCAI.

[21]  Maria L. Rizzo,et al.  Energy distance , 2016 .

[22]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[23]  Zhenhua Guo,et al.  Permutation-Invariant Feature Restructuring for Correlation-Aware Image Set-Based Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[25]  Sung-Hyuk Cha,et al.  A fast hue-based colour image indexing algorithm , 2002 .

[26]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[28]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[29]  Chao Yang,et al.  Dependency-Aware Attention Control for Unconstrained Face Recognition with Image Sets , 2018, ECCV.

[31]  Xiaofeng Liu,et al.  Unimodal-Uniform Constrained Wasserstein Training for Medical Diagnosis , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[32]  Sung-Hyuk Cha,et al.  On measuring the distance between histograms , 2002, Pattern Recognit..

[33]  Jian Yang,et al.  Importance-Aware Semantic Segmentation for Autonomous Vehicles , 2019, IEEE Transactions on Intelligent Transportation Systems.

[34]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Hossein Mobahi,et al.  Learning with a Wasserstein Loss , 2015, NIPS.

[37]  Chao Yang,et al.  Adaptive metric learning with deep neural networks for video-based facial expression recognition , 2018 .

[38]  Chao Yang,et al.  Ordinal Regression with Neuron Stick-Breaking for Medical Diagnosis , 2018, ECCV Workshops.

[39]  Jane You,et al.  Hard negative generation for identity-disentangled facial expression recognition , 2019, Pattern Recognit..