Domain Adaptive and Generalizable Network Architectures and Training Strategies for Semantic Image Segmentation

Unsupervised domain adaptation (UDA) and domain generalization (DG) enable machine learning models trained on a source domain to perform well on unlabeled or even unseen target domains. As previous UDA&DG semantic segmentation methods are mostly based on outdated networks, we benchmark more recent architectures, reveal the potential of Transformers, and design the DAFormer network tailored for UDA&DG. It is enabled by three training strategies to avoid overfitting to the source domain: While (1) Rare Class Sampling mitigates the bias toward common source domain classes, (2) a Thing-Class ImageNet Feature Distance and (3) a learning rate warmup promote feature transfer from ImageNet pretraining. As UDA&DG are usually GPU memory intensive, most previous methods downscale or crop images. However, low-resolution predictions often fail to preserve fine details while models trained with cropped images fall short in capturing long-range, domain-robust context information. Therefore, we propose HRDA, a multi-resolution framework for UDA&DG, that combines the strengths of small high-resolution crops to preserve fine segmentation details and large low-resolution crops to capture long-range context dependencies with a learned scale attention. DAFormer and HRDA significantly improve the state-of-the-art UDA&DG by more than 10 mIoU on 5 different benchmarks. The implementation is available at https://github.com/lhoyer/HRDA.

[1]  Sarah Adel Bargal,et al.  VisDA 2022 Challenge: Domain Adaptation for Industrial Waste Sorting , 2023, ArXiv.

[2]  N. Yokoya,et al.  OpenEarthMap: A Benchmark Dataset for Global High-Resolution Land Cover Mapping , 2022, 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[3]  Jianping Shi,et al.  Context-Aware Mixup for Domain Adaptive Semantic Segmentation , 2021, IEEE Transactions on Circuits and Systems for Video Technology.

[4]  L. Gool,et al.  MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation , 2022, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yiliang Xu,et al.  QuadFormer: Quadruple Transformer for Unsupervised Domain Adaptation in Power Line Segmentation of Aerial Images , 2022, ArXiv.

[6]  Jiaya Jia,et al.  DecoupleNet: Decoupled Network for Domain Adaptive Semantic Segmentation , 2022, ECCV.

[7]  L. Gool,et al.  HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation , 2022, ECCV.

[8]  Gim Hee Lee,et al.  Style-Hallucinated Dual Consistency Learning for Domain Generalized Semantic Segmentation , 2022, ECCV.

[9]  Yinjie Lei,et al.  Semantic-Aware Domain Generalized Segmentation , 2022, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  L. Gool,et al.  DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Luc Van Gool,et al.  Map-Guided Curriculum Domain Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Chongruo Wu,et al.  ResNeSt: Split-Attention Networks , 2020, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Luc Van Gool,et al.  DLOW: Domain Flow and Applications , 2021, International Journal of Computer Vision.

[14]  Lingqiao Liu,et al.  Global and Local Texture Randomization for Synthetic-to-Real Semantic Segmentation , 2021, IEEE Transactions on Image Processing.

[15]  Anima Anandkumar,et al.  SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers , 2021, NeurIPS.

[16]  Fahad Shahbaz Khan,et al.  Intriguing Properties of Vision Transformers , 2021, NeurIPS.

[17]  Nikita Araslanov,et al.  Self-supervised Augmentation Consistency for Adapting Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Luc Van Gool,et al.  Domain Adaptive Semantic Segmentation with Self-Supervised Depth Estimation , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[19]  Luc Van Gool,et al.  ACDC: The Adverse Conditions Dataset with Correspondences for Semantic Driving Scene Understanding , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Song Wang,et al.  DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Seungryong Kim,et al.  RobustNet: Improving Domain Generalization in Urban-Scene Segmentation via Instance Selective Whitening , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Andreas Veit,et al.  Understanding Robustness of Transformers for Image Classification , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Shijian Lu,et al.  FSDR: Frequency Space Domain Randomization for Domain Generalization , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yong Wang,et al.  Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Tao Xiang,et al.  Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Judy Hoffman,et al.  SENTRY: Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  L. Gool,et al.  Three Ways to Improve Semantic Segmentation with Self-Supervised Depth Estimation , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  S. Gelly,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2020, ICLR.

[29]  Carsten Rother,et al.  Benchmarking the Robustness of Semantic Segmentation Models with Respect to Common Corruptions , 2020, Int. J. Comput. Vis..

[30]  L. Svensson,et al.  DACS: Domain Adaptation via Cross-domain Mixed Sampling , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Lennart Svensson,et al.  ClassMix: Segmentation-Based Data Augmentation for Semi-Supervised Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[32]  D. Song,et al.  The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization , 2020, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[33]  Peilin Zhao,et al.  Context-Aware Domain Adaptation in Semantic Segmentation , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[34]  Mohsen Ali,et al.  Learning from Scale-Invariant Examples for Domain Adaptation in Semantic Segmentation , 2020, ECCV.

[35]  Wei Zhang,et al.  Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation , 2020, ECCV.

[36]  Xiaobing Zhang,et al.  Contextual-Relation Consistent Domain Adaptation for Semantic Segmentation , 2020, ECCV.

[37]  Karan Sapra,et al.  Hierarchical Multi-Scale Attention for Semantic Segmentation , 2020, ArXiv.

[38]  Javed Iqbal,et al.  MLSL: Multi-Level Self-Supervised Learning for Domain Adaptation with Spatially Independent and Semantically Consistent Labeling , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[39]  Xilin Chen,et al.  Object-Contextual Representations for Semantic Segmentation , 2019, ECCV.

[40]  Liyuan Liu,et al.  On the Variance of the Adaptive Learning Rate and Beyond , 2019, ICLR.

[41]  Trevor Darrell,et al.  BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning , 2018, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  K. Keutzer,et al.  Domain Randomization and Pyramid Consistency: Simulation-to-Real Generalization Without Accessing Target Domain Data , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[44]  Xilin Chen,et al.  Interlaced Sparse Self-Attention for Semantic Segmentation , 2019, ArXiv.

[45]  Anna Khoreva,et al.  Grid Saliency for Context Explanations of Semantic Segmentation , 2019, NeurIPS.

[46]  Dani Lischinski,et al.  ZigZagNet: Fusing Top-Down and Bottom-Up Context for Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Luc Van Gool,et al.  Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[48]  Dengxin Dai,et al.  Curriculum Model Adaptation with Synthetic and Real Data for Semantic Foggy Scene Understanding , 2019, International Journal of Computer Vision.

[49]  Patrick Pérez,et al.  ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Matthias Bethge,et al.  ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness , 2018, ICLR.

[51]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Thomas G. Dietterich,et al.  Benchmarking Neural Network Robustness to Common Corruptions and Perturbations , 2018, ICLR.

[53]  Luc Van Gool,et al.  Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime , 2018, 2018 21st International Conference on Intelligent Transportation Systems (ITSC).

[54]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[55]  Yuning Jiang,et al.  Unified Perceptual Parsing for Scene Understanding , 2018, ECCV.

[56]  Xiaoou Tang,et al.  Two at Once: Enhancing Learning and Generalization Capacities via IBN-Net , 2018, ECCV.

[57]  Gang Peng,et al.  Attention to Refine through Multi-Scales for Semantic Segmentation , 2018, PCM.

[58]  Xiaogang Wang,et al.  Context Encoding for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[59]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[60]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[61]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[62]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[63]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[64]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[67]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[68]  Kaiming He,et al.  Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.

[69]  Serge J. Belongie,et al.  Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[70]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[71]  Martial Hebert,et al.  Learning to Model the Tail , 2017, NIPS.

[72]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[73]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[74]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[78]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.