Road scenes segmentation across different domains by disentangling latent representations

Deep learning models obtain impressive accuracy in road scenes understanding, however they need a large quantity of labeled samples for their training. Additionally, such models do not generalize well to environments where the statistical properties of data do not perfectly match those of training scenes, and this can be a significant problem for intelligent vehicles. Hence, domain adaptation approaches have been introduced to transfer knowledge acquired on a label-abundant source domain to a related label-scarce target domain. In this work, we design and carefully analyze multiple latent spaceshaping regularization strategies that work together to reduce the domain shift. More in detail, we devise a feature clustering strategy to increase domain alignment, a feature perpendicularity constraint to space apart features belonging to different semantic classes, including those not present in the current batch, and a feature norm alignment strategy to separate active and inactive channels. In addition, we propose a novel evaluation metric to capture the relative performance of an adapted model with respect to supervised training. We validate our framework in driving scenarios, considering both synthetic-to-real and realto-real adaptation, outperforming previous feature-level state-ofthe-art methods on multiple road scenes benchmarks.

[1]  Gianluca Agresti,et al.  Unsupervised Domain Adaptation for Semantic Segmentation of Urban Scenes , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[2]  Lennart Svensson,et al.  DACS: Domain Adaptation via Cross-domain Mixed Sampling , 2020, ArXiv.

[3]  Pietro Zanuttigh,et al.  Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[4]  Yi-Hsuan Tsai,et al.  Domain Adaptation for Structured Output via Discriminative Patch Representations , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[5]  Lei Tian,et al.  Domain Adaptation by Class Centroid Matching and Local Manifold Self-Learning , 2020, IEEE Transactions on Image Processing.

[6]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Fabio Pizzati,et al.  Domain Bridge for Unpaired Image-to-Image Translation and Unsupervised Domain Adaptation , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Min Sun,et al.  No More Discrimination: Cross City Adaptation of Road Scene Segmenters , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Germán Ros,et al.  CARLA: An Open Urban Driving Simulator , 2017, CoRL.

[12]  Patrick Pérez,et al.  ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Pietro Zanuttigh,et al.  Continual Semantic Segmentation via Repulsion-Attraction of Sparse and Disentangled Latent Representations , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Tieniu Tan,et al.  Distant Supervised Centroid Shift: A Simple and Efficient Approach to Visual Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[17]  Deng Cai,et al.  Domain Adaptation for Semantic Segmentation With Maximum Squares Loss , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Namil Kim,et al.  Drop to Adapt: Learning Discriminative Features for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Ming-Hsuan Yang,et al.  CrDoCo: Pixel-Level Domain Transfer With Cross-Domain Consistency , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[23]  Toby P. Breckon,et al.  Unsupervised Domain Adaptation via Structured Prediction Based Selective Pseudo-Labeling , 2019, AAAI.

[24]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yi Yang,et al.  Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Il-Chul Moon,et al.  Adversarial Dropout for Supervised and Semi-supervised Learning , 2017, AAAI.

[27]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[28]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[29]  Eric P. Xing,et al.  Few-Shot Semantic Segmentation with Prototype Learning , 2018, BMVC.

[30]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[31]  Pedro H. O. Pinheiro,et al.  Unsupervised Domain Adaptation with Similarity Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[33]  Gianluca Agresti,et al.  Synth . segmentation Real segmentation Synth . GT Synth . RGB Real RGB Fully Convolutional Discriminator synthetic path real path Region Growing , 2019 .

[34]  Pietro Zanuttigh,et al.  Latent Space Regularization for Unsupervised Domain Adaptation in Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[35]  Yanjun Wu,et al.  Spatial Attention Pyramid Network for Unsupervised Domain Adaptation , 2020, ECCV.

[36]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[39]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  C. V. Jawahar,et al.  IDD: A Dataset for Exploring Problems of Autonomous Navigation in Unconstrained Environments , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[42]  Pavan Turaga,et al.  Role of Orthogonality Constraints in Improving Properties of Deep Networks for Image Classification , 2020, ArXiv.

[43]  Kate Saenko,et al.  Adversarial Dropout Regularization , 2017, ICLR.

[44]  David J. Kriegman,et al.  Image to Image Translation for Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[45]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[46]  Roberto Cipolla,et al.  Semantic object classes in video: A high-definition ground truth database , 2009, Pattern Recognit. Lett..

[47]  Gianluca Agresti,et al.  Unsupervised Domain Adaptation for Mobile Semantic Segmentation based on Cycle Consistency and Feature Alignment , 2020, Image Vis. Comput..

[48]  Pietro Zanuttigh,et al.  Unsupervised Domain Adaptation in Semantic Segmentation: a Review , 2020, ArXiv.

[49]  Hau-San Wong,et al.  Improving Domain-Specific Classification by Collaborative Learning with Adaptation Networks , 2019, AAAI.

[50]  Jiashi Feng,et al.  PANet: Few-Shot Image Semantic Segmentation With Prototype Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Surface Variations , 2018, 1807.01697.

[52]  Pietro Zanuttigh,et al.  Unsupervised Domain Adaptation in Semantic Segmentation via Orthogonal and Clustered Embeddings , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[53]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[54]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[55]  Philip David,et al.  A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[56]  Jingang Tan,et al.  SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[57]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.