Semantic Distribution-aware Contrastive Adaptation for Semantic Segmentation

Domain adaptive semantic segmentation refers to making accurate dense predictions on a certain target domain with only pixel-level annotations of a specific source domain. Current state-of-the-art works suggest that performing alignment from the category perspective can alleviate domain shift reasonably. However, they are mainly based on image-to-image adversarial training and little consideration is given to semantic variations of an object among different images. A possible consequence is that such alignment fails to capture a comprehensive picture of different categories, leading to unstable category alignment and limited generalization advances. This motivates us to explore a holistic representative, the semantic distribution from each category in the source domain, to mitigate the problem above. In this paper, we present a new semantic distribution-aware contrastive adaptation algorithm, dubbed as SDCA, that enables pixel-wise representation alignment across domains under the guidance of the semantic distributions. To be precise, we first design a novel contrastive loss at pixel level by considering the correspondences between the semantic distributions and pixel-wise representations from both domains. Essentially, clusters of pixel representations from the same category are obliged to cluster together and those from different categories are obliged to spread out, boosting segmentation capability of the model. Next, an upper bound on this formulation is derived by implicitly involving the simultaneous learning of an infinite number of (dis)similar pixel pairs, making it highly efficient. Though simple, we empirically unveil the certain mechanisms that promote the potential of SDCA. Finally, we verify that SDCA can further improve the segmentation accuracy when integrated with the self-supervised learning method. We evaluate the proposed method on multiple benchmarks, achieving considerable improvements over existing algorithms.

[1]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[5]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[6]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[7]  Li Fei-Fei,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[9]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[10]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Joachim M. Buhmann,et al.  Automatic Detection and Segmentation of Crohn's Disease Tissues From Abdominal MRI , 2013, IEEE Transactions on Medical Imaging.

[13]  Trevor Darrell,et al.  Deep Domain Confusion: Maximizing for Domain Invariance , 2014, CVPR 2014.

[14]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[15]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[16]  Jitendra Malik,et al.  Indoor Scene Understanding with RGB-D Images: Bottom-up Segmentation, Object Detection and Semantic Segmentation , 2015, International Journal of Computer Vision.

[17]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[18]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[19]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[20]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[22]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[24]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[25]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[26]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[28]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[33]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[34]  Min Sun,et al.  No More Discrimination: Cross City Adaptation of Road Scene Segmenters , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[35]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[36]  Chuan Chen,et al.  Learning Semantic Representations for Unsupervised Domain Adaptation , 2018, ICML.

[37]  Cheng Wu,et al.  Domain Invariant and Class Discriminative Feature Learning for Visual Domain Adaptation , 2018, IEEE Transactions on Image Processing.

[38]  Tatsuya Harada,et al.  Maximum Classifier Discrepancy for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[40]  Na Liu,et al.  Unsupervised Cross-Corpus Speech Emotion Recognition Using Domain-Adaptive Subspace Learning , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[41]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[42]  Michael I. Jordan,et al.  Conditional Adversarial Domain Adaptation , 2017, NeurIPS.

[43]  Matti Pietikäinen,et al.  From BoW to CNN: Two Decades of Texture Representation for Texture Classification , 2018, International Journal of Computer Vision.

[44]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[45]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[46]  Matthew B. Blaschko,et al.  The Lovasz-Softmax Loss: A Tractable Surrogate for the Optimization of the Intersection-Over-Union Measure in Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[47]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Tao Mei,et al.  Deep Domain Adaptation Hashing with Adversarial Learning , 2018, SIGIR.

[49]  Patrick Pérez,et al.  ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Matti Pietikäinen,et al.  Guest Editors' Introduction to the Special Section on Compact and Efficient Feature Representation and Learning in Computer Vision , 2019, IEEE Trans. Pattern Anal. Mach. Intell..

[51]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[52]  Jingang Tan,et al.  SSF-DAN: Separated Semantic Feature Based Domain Adaptation Network for Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Yi Yang,et al.  Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Dacheng Tao,et al.  Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation , 2019, NeurIPS.

[55]  Fengmao Lv,et al.  Constructing Self-Motivated Pyramid Curriculums for Cross-Domain Semantic Segmentation: A Non-Adversarial Approach , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[56]  Yi Yang,et al.  Taking a Closer Look at Domain Shift: Category-Level Adversaries for Semantics Consistent Domain Adaptation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[58]  Junqing Yu,et al.  Significance-Aware Information Bottleneck for Domain Adaptive Semantic Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[59]  Nuno Vasconcelos,et al.  Bidirectional Learning for Domain Adaptation of Semantic Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[61]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[62]  Ching-Yao Chuang,et al.  Debiased Contrastive Learning , 2020, NeurIPS.

[63]  Wen-mei W. Hwu,et al.  Differential Treatment for Stuff and Things: A Simple Unsupervised Domain Adaptation Method for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  Yunchao Wei,et al.  Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation , 2020, NeurIPS.

[65]  In So Kweon,et al.  Unsupervised Intra-Domain Adaptation for Semantic Segmentation Through Self-Supervision , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Ruigang Yang,et al.  Joint 3D Instance Segmentation and Object Detection for Autonomous Driving , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Ross B. Girshick,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Yu Wang,et al.  Joint Contrastive Learning with Infinite Possibilities , 2020, NeurIPS.

[69]  Wei Zhang,et al.  Classes Matter: A Fine-grained Adversarial Approach to Cross-domain Semantic Segmentation , 2020, ECCV.

[70]  Philip David,et al.  A Curriculum Domain Adaptation Approach to the Semantic Segmentation of Urban Scenes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[71]  Stefano Soatto,et al.  FDA: Fourier Domain Adaptation for Semantic Segmentation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Wanli Ouyang,et al.  Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Zhengming Ding,et al.  Deep Residual Correction Network for Partial Domain Adaptation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[74]  Tao Kong,et al.  Dense Contrastive Learning for Self-Supervised Visual Pre-Training , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Stephen Lin,et al.  Propagate Yourself: Exploring Pixel-Level Consistency for Unsupervised Visual Representation Learning , 2020, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[76]  Gao Huang,et al.  Regularizing Deep Networks With Semantic Data Augmentation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Jan Kautz,et al.  Domain Stylization: A Fast Covariance Matching Framework Towards Domain Adaptation , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Jing Zhang,et al.  Progressive Modality Cooperation for Multi-Modality Domain Adaptation , 2021, IEEE Transactions on Image Processing.

[79]  Liang Zheng,et al.  Category-Level Adversarial Adaptation for Semantic Segmentation Using Purified Features , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.