论文信息 - AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection

AFAN: Augmented Feature Alignment Network for Cross-Domain Object Detection

Unsupervised domain adaptation for object detection is a challenging problem with many real-world applications. Unfortunately, it has received much less attention than supervised object detection. Models that try to address this task tend to suffer from a shortage of annotated training samples. Moreover, existing methods of feature alignments are not sufficient to learn domain-invariant representations. To address these limitations, we propose a novel augmented feature alignment network (AFAN) which integrates intermediate domain image generation and domain-adversarial training into a unified framework. An intermediate domain image generator is proposed to enhance feature alignments by domain-adversarial training with automatically generated soft domain labels. The synthetic intermediate domain images progressively bridge the domain divergence and augment the annotated source domain training data. A feature pyramid alignment is designed and the corresponding feature discriminator is used to align multi-scale convolutional features of different semantic levels. Last but not least, we introduce a region feature alignment and an instance discriminator to learn domain-invariant features for object proposals. Our approach significantly outperforms the state-of-the-art methods on standard benchmarks for both similar and dissimilar domain adaptations. Further extensive experiments verify the effectiveness of each component and demonstrate that the proposed network can learn domain-invariant representations.

[1] Luc Van Gool,et al. Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[2] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[3] Xinge Zhu,et al. Adapting Object Detectors via Selective Cross-Domain Alignment , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Bingbing Ni,et al. Adversarial Domain Adaptation with Domain Mixup , 2019, AAAI.

[5] Dariu Gavrila,et al. EuroCity Persons: A Novel Benchmark for Person Detection in Traffic Scenes , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Ross B. Girshick,et al. Mask R-CNN , 2017, 1703.06870.

[7] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Mengjie Zhang,et al. Deep Reconstruction-Classification Networks for Unsupervised Domain Adaptation , 2016, ECCV.

[10] Luc Van Gool,et al. Semi-Supervised Learning by Augmented Distribution Alignment , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[11] Ali Farhadi,et al. YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[13] Matthew Johnson-Roberson,et al. Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14] Trevor Darrell,et al. Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Kate Saenko,et al. Strong-Weak Distribution Alignment for Adaptive Object Detection , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Yun Ma,et al. Virtual Mixup Training for Unsupervised Domain Adaptation , 2019, ArXiv.

[17] Tao Mei,et al. ScratchDet: Training Single-Shot Object Detectors From Scratch , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Dong Xu,et al. Learning Rotation-Invariant and Fisher Discriminative Convolutional Neural Networks for Object Detection , 2019, IEEE Transactions on Image Processing.

[19] Ming-Hsuan Yang,et al. Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Bingbing Ni,et al. Cross-Domain Detection via Graph-Induced Prototype Alignment , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Maria L. Rizzo,et al. Energy statistics: A class of statistics based on distances , 2013 .

[22] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Changick Kim,et al. Diversify and Match: A Domain Adaptive Representation Learning Paradigm for Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Andreas Geiger,et al. Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[25] Yizhou Wang,et al. Multi-Level Domain Adaptive Learning for Cross-Domain Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[26] Yuan Shi,et al. Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27] Luc Van Gool,et al. Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[28] Changick Kim,et al. Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[29] Chong-Wah Ngo,et al. Exploring Object Relation in Mean Teacher for Cross-Domain Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Ivor W. Tsang,et al. Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[32] Lincan Zou,et al. Improve Unsupervised Domain Adaptation with Mixup Training , 2020, ArXiv.

[33] Xiu-Shen Wei,et al. Exploring Categorical Regularization for Domain Adaptive Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34] Yue Cao,et al. Transferable Representation Learning with Deep Adaptation Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[36] Larry S. Davis,et al. Domain adaptive object detection , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[37] Sebastian Ramos,et al. The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Yoshua Bengio,et al. Generative Adversarial Nets , 2014, NIPS.

[39] Hao Chen,et al. FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[40] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[41] Ross B. Girshick,et al. Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42] Nuno Vasconcelos,et al. Towards Universal Object Detection by Domain Attention , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[43] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[44] Abhishek Das,et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[45] Songtao Liu,et al. Cross-domain Object Detection through Coarse-to-Fine Feature Adaptation , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[46] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[47] David Berthelot,et al. MixMatch: A Holistic Approach to Semi-Supervised Learning , 2019, NeurIPS.

[48] Qiang Yang,et al. A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[49] Victor S. Lempitsky,et al. Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[50] Philip David,et al. Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[51] Michael I. Jordan,et al. Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[52] Dacheng Tao,et al. Perceptual Adversarial Networks for Image-to-Image Transformation , 2017, IEEE Transactions on Image Processing.

[53] Liangliang Cao,et al. Automatic Adaptation of Object Detectors to New Domains Using Self-Training , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[54] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55] Kiyoharu Aizawa,et al. Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56] Zhiguo Cao,et al. An Embarrassingly Simple Approach to Visual Domain Adaptation , 2018, IEEE Transactions on Image Processing.

[57] Xinghao Ding,et al. Harmonizing Transferability and Discriminability for Adapting Object Detectors , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58] Kaiming He,et al. Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59] Arash Vahdat,et al. A Robust Learning Approach to Domain Adaptive Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[60] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[61] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62] Trevor Darrell,et al. Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[63] Bernhard Schölkopf,et al. Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[64] Andreas Geiger,et al. Are we ready for autonomous driving? The KITTI vision benchmark suite , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[65] Cheng Wu,et al. Domain Invariant and Class Discriminative Feature Learning for Visual Domain Adaptation , 2018, IEEE Transactions on Image Processing.

[66] Taesung Park,et al. CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[67] Yuning Jiang,et al. FoveaBox: Beyound Anchor-Based Object Detection , 2019, IEEE Transactions on Image Processing.

[68] Wei Liu,et al. High-Level Semantic Feature Detection: A New Perspective for Pedestrian Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69] Weiming Dong,et al. Self-Supervised Feature Augmentation for Large Image Object Detection , 2020, IEEE Transactions on Image Processing.

[70] François Laviolette,et al. Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[71] Swami Sankaranarayanan,et al. Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.