An Adversarial Self-Learning Method for Cross-City Adaptation in Semantic Segmentation

Semantic segmentation is an important task in the visual system of self-driving cars. The semantic segmentation models based on the CNN (Convolutional Neural Network) trained with the large numbers of annotated labels may not work well at the environments different from the training sets due to the domain gap between the train and test domains. Just for the reduction of the distance between the source and target domains, domain adaptation methods are proposed for the unsupervised training with the unlabeled target domain. Not only the reduction of the domain-shift, but we also propose the self-learning method to enhance the predicted probabilities of the target domain. To gain more accurate probability maps of the target domain generated from the segmentation model which is trained by the source domain, we propose the adversarial self-learning method which is consists of the domain adaptation part and self-learning part. The adversarial self-learning method can maximize the predicted probabilities for the probability maps of the target domain gained from the segmentation model which is adapted with the domain adaptation method before the self-learning. With the Cityscapes to NTHU cross-city adaptation experiments, we can see that the adversarial self-learning method can achieve stateof-the-art results compared with the domain adaptation methods proposed in the recent researches.

[1]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Dani Lischinski,et al.  Multi-scale Context Intertwining for Semantic Segmentation , 2018, ECCV.

[3]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  David J. Kriegman,et al.  Image to Image Translation for Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[6]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[7]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Ming-Hsuan Yang,et al.  Learning to Adapt Structured Output Space for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Gabriel J. Brostow,et al.  Digging Into Self-Supervised Monocular Depth Estimation , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Li Wen,et al.  Learning Semantic Segmentation From Synthetic Data: A Geometrically Guided Input-Output Adaptation Approach , 2019 .

[14]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Soundararajan Ezekiel,et al.  Investigating GAN and VAE to train DCNN , 2019 .

[18]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[21]  Wael Farag,et al.  Road Lane-Lines Detection in Real-Time for Advanced Driving Assistance Systems , 2018, 2018 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT).

[22]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[24]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Bastian Leibe,et al.  Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Li Fei-Fei,et al.  Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Luc Van Gool,et al.  ROAD: Reality Oriented Adaptation for Semantic Segmentation of Urban Scenes , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[28]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[29]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[30]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[31]  Peter Kontschieder,et al.  The Mapillary Vistas Dataset for Semantic Understanding of Street Scenes , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Frans Coenen,et al.  Traffic sign recognition with convolutional neural network based on max pooling positions , 2016, 2016 12th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD).

[36]  Min Sun,et al.  No More Discrimination: Cross City Adaptation of Road Scene Segmenters , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[37]  Sebastian Ruder,et al.  An overview of gradient descent optimization algorithms , 2016, Vestnik komp'iuternykh i informatsionnykh tekhnologii.

[38]  Trevor Darrell,et al.  FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation , 2016, ArXiv.

[39]  Gabriela Csurka,et al.  What is a good evaluation measure for semantic segmentation? , 2013, BMVC.

[40]  Yi Li,et al.  Fully Convolutional Instance-Aware Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Philip David,et al.  Domain Adaptation for Semantic Segmentation of Urban Scenes , 2017 .