Pixel and Feature Level Based Domain Adaption for Object Detection in Autonomous Driving

Annotating large scale datasets to train modern convolutional neural networks is prohibitively expensive and time-consuming for many real tasks. One alternative is to train the model on labeled synthetic datasets and apply it in the real scenes. However, this straightforward method often fails to generalize well mainly due to the domain bias between the synthetic and real datasets. Many unsupervised domain adaptation (UDA) methods are introduced to address this problem but most of them only focus on the simple classification task. In this paper, we present a novel UDA model to solve the more complex object detection problem in the context of autonomous driving. Our model integrates both pixel level and feature level based transformtions to fulfill the cross domain detection task and can be further trained end-to-end to pursue better performance. We employ objectives of the generative adversarial network and the cycle consistency loss for image translation in the pixel space. To address the potential semantic inconsistency problem, we propose region proposal based feature adversarial training to preserve the semantics of our target objects as well as further minimize the domain shifts. Extensive experiments are conducted on several different datasets, and the results demonstrate the robustness and superiority of our method.

[1]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Wei Liu,et al.  Knowledge transfer for spectral clustering , 2018, Pattern Recognit..

[4]  Fu-Lai Chung,et al.  Stacked Robust Adaptively Regularized Auto-Regressions for Domain Adaptation , 2019, IEEE Transactions on Knowledge and Data Engineering.

[5]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Alexei A. Efros,et al.  Toward Multimodal Image-to-Image Translation , 2017, NIPS.

[7]  Kate Saenko,et al.  Return of Frustratingly Easy Domain Adaptation , 2015, AAAI.

[8]  Wei Liu,et al.  Deep Self-Taught Learning for Weakly Supervised Object Localization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[10]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[11]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[12]  George Trigeorgis,et al.  Domain Separation Networks , 2016, NIPS.

[13]  Matthew Johnson-Roberson,et al.  Driving in the Matrix: Can virtual worlds replace human-generated annotations for real world tasks? , 2016, 2017 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[15]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[18]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wei Liu,et al.  Theoretic Analysis and Extremely Easy Algorithms for Domain Adaptive Feature Learning , 2015, IJCAI.

[20]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ali Farhadi,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[22]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[23]  Koby Crammer,et al.  Analysis of Representations for Domain Adaptation , 2006, NIPS.

[24]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[25]  Wenyu Liu,et al.  Weakly Supervised Region Proposal Network and Object Detection , 2018, ECCV.

[26]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[27]  Xin Wang,et al.  Few-Shot Object Detection via Feature Reweighting , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[28]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[30]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[34]  Wei Liu,et al.  DSSD : Deconvolutional Single Shot Detector , 2017, ArXiv.

[35]  Ming-Yu Liu,et al.  Coupled Generative Adversarial Networks , 2016, NIPS.

[36]  Thomas S. Huang,et al.  Image Super-Resolution Via Sparse Representation , 2010, IEEE Transactions on Image Processing.

[37]  Antonio Torralba,et al.  Cross-Modal Scene Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[39]  Luc Van Gool,et al.  Semantic Foggy Scene Understanding with Synthetic Data , 2017, International Journal of Computer Vision.

[40]  Philip S. Yu,et al.  Transfer Feature Learning with Joint Distribution Adaptation , 2013, 2013 IEEE International Conference on Computer Vision.

[41]  Hang Zhang,et al.  Multi-style Generative Network for Real-time Transfer , 2017, ECCV Workshops.

[42]  Hans-Peter Kriegel,et al.  Integrating structured biological data by Kernel Maximum Mean Discrepancy , 2006, ISMB.

[43]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[45]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[46]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[48]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[49]  Eric P. Xing,et al.  Semantic-aware Grad-GAN for Virtual-to-Real Urban Scene Adaption , 2018, BMVC.

[50]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[51]  Charu C. Aggarwal,et al.  Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Rama Chellappa,et al.  Domain adaptation for object recognition: An unsupervised approach , 2011, 2011 International Conference on Computer Vision.

[55]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[56]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).