Synthetic-to-Real Domain Adaptation for Object Instance Segmentation

Object instance segmentation can achieve preferable results, powered with sufficient labeled training data. However, it is time-consuming for manually labeling, leading to the lack of large-scale diversified datasets with accurate instance segmentation annotations. Exploiting the synthetic data is a very promising solution except for domain distribution mismatch between synthetic dataset and real dataset. In this paper, we propose a synthetic-to-real domain adaptation method for object instance segmentation. At first, this approach is trained to generate object detection and segmentation using annotated data from synthetic dataset. Then, a feature adaptation module (FAM) is applied to reduce data distribution mismatch between synthetic dataset and real dataset. The FAM performs domain adaptation from three different aspects: global-level base feature adaptation module, local-level instance feature adaptation module, and subtle-level mask feature adaptation module. It is implemented based on novel discriminator networks with adversarial learning. The three modules of FAM have positive effects on improving the performance when adapting from synthetic to real scenes. We evaluate the proposed approach on Cityscapes dataset by adapting from Virtual KITTI and SYNTHIA datasets. The results show that it achieves a significantly better performance over the state-of-the-art methods.

[1]  Bernt Schiele,et al.  What Makes for Effective Detection Proposals? , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianfei Cai,et al.  An Exemplar-Based Multi-View Domain Generalization Framework for Visual Recognition , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[5]  Lars Petersson,et al.  Bringing Background into the Foreground: Making All Classes Equal in Weakly-Supervised Video Semantic Segmentation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Luc Van Gool,et al.  Domain Adaptive Faster R-CNN for Object Detection in the Wild , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Tinne Tuytelaars,et al.  Subspace Alignment Based Domain Adaptation for RCNN Detector , 2015, BMVC.

[8]  Hong Zhang,et al.  Multi-scale Patch Aggregation (MPA) for Simultaneous Detection and Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Qiao Wang,et al.  VirtualWorlds as Proxy for Multi-object Tracking Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Yiping Ke,et al.  Feature Analysis of Marginalized Stacked Denoising Autoenconder for Unsupervised Domain Adaptation , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Larry S. Davis,et al.  Domain adaptive object detection , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[13]  Jian Sun,et al.  Convolutional feature masking for joint object and stuff segmentation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[15]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .

[17]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[19]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[22]  Amaury Lendasse,et al.  Domain Adaption via Feature Selection on Explicit Feature Map , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Lars Petersson,et al.  Effective Use of Synthetic Data for Urban Scene Semantic Segmentation , 2018, ECCV.

[26]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[27]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[28]  Ke Lu,et al.  Heterogeneous Domain Adaptation Through Progressive Alignment , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Yunchao Wei,et al.  Proposal-Free Network for Instance-Level Object Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Tatsuya Harada,et al.  Maximum Classifier Discrepancy for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Jiaolong Xu,et al.  Domain Adaptation of Deformable Part-Based Models , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.