Seeking Similarities over Differences: Similarity-based Domain Alignment for Adaptive Object Detection

In order to robustly deploy object detectors across a wide range of scenarios, they should be adaptable to shifts in the input distribution without the need to constantly annotate new data. This has motivated research in Unsupervised Domain Adaptation (UDA) algorithms for detection. UDA methods learn to adapt from labeled source domains to unlabeled target domains, by inducing alignment between detector features from source and target domains. Yet, there is no consensus on what features to align and how to do the alignment. In our work, we propose a framework that generalizes the different components commonly used by UDA methods laying the ground for an in-depth analysis of the UDA design space. Specifically, we propose a novel UDA algorithm, ViSGA, a direct implementation of our framework, that leverages the best design choices and introduces a simple but effective method to aggregate features at instance-level based on visual similarity before inducing group alignment via adversarial training. We show that both similarity-based grouping and adversarial training allows our model to focus on coarsely aligning feature groups, without being forced to match all instances across loosely aligned domains. Finally, we examine the applicability of ViSGA to the setting where labeled data are gathered from different sources. Experiments show that not only our method outperforms previous single-source approaches on Sim2Real and Adverse Weather, but also generalizes well to the multi-source setting.

[1]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Yi Yang,et al.  Contrastive Adaptation Network for Unsupervised Domain Adaptation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[8]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[9]  Bo Wang,et al.  Moment Matching for Multi-Source Domain Adaptation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Songtao Liu,et al.  Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[13]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[15]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[16]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[17]  Weilin Huang,et al.  iFAN: Image-Instance Full Alignment Networks for Adaptive Object Detection , 2020, AAAI.

[18]  Changick Kim,et al.  Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[20]  Xinge Zhu,et al.  Adapting Object Detectors via Selective Cross-Domain Alignment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Fabio Maria Carlucci,et al.  AutoDIAL: Automatic Domain Alignment Layers , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Bingbing Ni,et al.  Cross-Domain Detection via Graph-Induced Prototype Alignment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Lei Zhang,et al.  Multi-Adversarial Faster-RCNN for Unrestricted Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[26]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Kate Saenko,et al.  Strong-Weak Distribution Alignment for Adaptive Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  José M. F. Moura,et al.  Adversarial Multiple Source Domain Adaptation , 2018, NeurIPS.

[30]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[32]  Ivor W. Tsang,et al.  Visual Event Recognition in Videos by Learning from Web Data , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[34]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[35]  Liang Lin,et al.  Deep Cocktail Network: Multi-source Unsupervised Domain Adaptation with Category Shift , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[36]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[37]  Zhiguo Cao,et al.  When Unsupervised Domain Adaptation Meets Tensor Representations , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[39]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[41]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[42]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[43]  Chong-Wah Ngo,et al.  Exploring Object Relation in Mean Teacher for Cross-Domain Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xiu-Shen Wei,et al.  Exploring Categorical Regularization for Domain Adaptive Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Xinghao Ding,et al.  Harmonizing Transferability and Discriminability for Adapting Object Detectors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Donald A. Adjeroh,et al.  Unified Deep Supervised Domain Adaptation and Generalization , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Kurt Keutzer,et al.  Multi-source Domain Adaptation for Semantic Segmentation , 2019, NeurIPS.

[50]  Ivor W. Tsang,et al.  Domain Transfer Multiple Kernel Learning , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.