Cross Domain Adaptation for on-Road Object Detection Using Multimodal Structure-Consistent Image-to-Image Translation

Image-to-image translation is potential to boost the detection accuracy of a CNN-based object detector in a different domain. Despite recent GAN (Generative Adversarial Network) based methods have shown compelling visual results, they are prone to fail at preserving image-objects and maintaining structure consistency when faced with large and complex domain shifts such as day-to-night, which reduces their practicality on tasks such as generating large-scale training data for different domains. In this work, we introduce image-translation-structure and cycle-structure consistency for generating diverse and structure-preserved translated images across complex domains, such as between day and night, for object detector training. Given only a single/labelled image at daytime, our model could generate a diverse collection of images at nighttime with different ambient light levels and rear lamp conditions (on/off) but with the same vehicle type, color and locations. Qualitative results show that our model can generate diverse and realistic images in the target domain data. For quantitative comparisons, we evaluate other competing methods and ours by using the generated images to train the Faster R-CNN and YOLO detectors and prove that our model achieves significant improvement and outperforms other methods on detection accuracy.