AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection

Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. Unfortunately, it has received much less attention than supervised object detection. Models that try to address this task tend to suffer from a shortage of annotated training samples. Moreover, existing methods of feature alignments are not sufficient to learn domain-invariant representations. To address these limitations, we propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training into a unified framework. An intermediate domain image generator is proposed to enhance feature alignments by domain-adversarial training with automatically generated soft domain labels. The synthetic intermediate domain images progressively bridge the domain divergence and augment the annotated source domain training data. A feature pyramid alignment is designed and the corresponding feature discriminator is used to align multi-scale convolutional features of different semantic levels. Last but not least, we introduce a region feature alignment and an instance discriminator to learn domain-invariant features for object proposals. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations. Further extensive experiments verify the effectiveness of each component and demonstrate that the proposed network can learn domain-invariant representations.

[1]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[3]  Xinge Zhu,et al.  Adapting Object Detectors via Selective Cross-Domain Alignment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Bingbing Ni,et al.  Adversarial Domain Adaptation with Domain Mixup , 2019, AAAI.

[5]  Dariu Gavrila,et al.  EuroCity Persons: A Novel Benchmark for Person Detection in Traffic Scenes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Mengjie Zhang,et al.  Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[10]  Luc Van Gool,et al.  Semi-Supervised Learning by Augmented Distribution Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kate Saenko,et al.  Strong-Weak Distribution Alignment for Adaptive Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yun Ma,et al.  Virtual Mixup Training for Unsupervised Domain Adaptation , 2019, ArXiv.

[17]  Tao Mei,et al.  ScratchDet: Training Single-Shot Object Detectors From Scratch , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Dong Xu,et al.  Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2019, IEEE Transactions on Image Processing.

[19]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Bingbing Ni,et al.  Cross-Domain Detection via Graph-Induced Prototype Alignment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Changick Kim,et al.  Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[25]  Yizhou Wang,et al.  Multi-Level Domain Adaptive Learning for Cross-Domain Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[28]  Changick Kim,et al.  Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29]  Chong-Wah Ngo,et al.  Exploring Object Relation in Mean Teacher for Cross-Domain Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[32]  Lincan Zou,et al.  Improve Unsupervised Domain Adaptation with Mixup Training , 2020, ArXiv.

[33]  Xiu-Shen Wei,et al.  Exploring Categorical Regularization for Domain Adaptive Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Yue Cao,et al.  Transferable Representation Learning with Deep Adaptation Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36]  Larry S. Davis,et al.  Domain adaptive object detection , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[37]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[41]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Nuno Vasconcelos,et al.  Towards Universal Object Detection by Domain Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[44]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[45]  Songtao Liu,et al.  Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[47]  David Berthelot,et al.  MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[48]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[49]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[50]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[51]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[52]  Dacheng Tao,et al.  Perceptual Adversarial Networks for Image-to-Image Transformation , 2017, IEEE Transactions on Image Processing.

[53]  Liangliang Cao,et al.  Automatic Adaptation of Object Detectors to New Domains Using Self-Training , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Zhiguo Cao,et al.  An Embarrassingly Simple Approach to Visual Domain Adaptation , 2018, IEEE Transactions on Image Processing.

[57]  Xinghao Ding,et al.  Harmonizing Transferability and Discriminability for Adapting Object Detectors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Arash Vahdat,et al.  A Robust Learning Approach to Domain Adaptive Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[61]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Trevor Darrell,et al.  Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[64]  Andreas Geiger,et al.  Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65]  Cheng Wu,et al.  Domain Invariant and Class Discriminative Feature Learning for Visual Domain Adaptation , 2018, IEEE Transactions on Image Processing.

[66]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[67]  Yuning Jiang,et al.  FoveaBox: Beyound Anchor-Based Object Detection , 2019, IEEE Transactions on Image Processing.

[68]  Wei Liu,et al.  High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Weiming Dong,et al.  Self-Supervised Feature Augmentation for Large Image Object Detection , 2020, IEEE Transactions on Image Processing.

[70]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[71]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.