Do Game Data Generalize Well for Remote Sensing Image Segmentation?

Despite the recent progress in deep learning and remote sensing image analysis, the adaption of a deep learning model between different sources of remote sensing data still remains a challenge. This paper investigates an interesting question: do synthetic data generalize well for remote sensing image applications? To answer this question, we take the building segmentation as an example by training a deep learning model on the city map of a well-known PC game “Grand Theft Auto V” and then adapting the model to real-world remote sensing images. We propose a Generative Adversarial Networks (GAN) based segmentation framework to improve the adaptability of the segmentation model. Our model consists of a CycleGAN model and a ResNet based segmentation model, where the former one is a well-known image to image translation framework which learns a mapping of the image from the game domain to the remote sensing domain; and the latter learns to predict pixel-wise building masks based on the transformed data. All models in our method can be trained in an end-to-end fashion. The segmentation model can be trained without requiring any additional ground truth reference of the real-world images. Experimental results on a public building segmentation dataset suggest the effectiveness of our adaptation method. Our method outperforms some other state of the art semantic segmentation methods, e.g. Deeplab-v3 and UNet. Another advantage of our method is that by introducing semantic information to the image to image translation framework, the image style conversion can be further improved.

[1]  Mei Wang,et al.  Deep Visual Domain Adaptation: A Survey , 2018, Neurocomputing.

[2]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[3]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jiebo Luo,et al.  DOTA: A Large-Scale Dataset for Object Detection in Aerial Images , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Swami Sankaranarayanan,et al.  Learning from Synthetic Data: Addressing Domain Shift for Semantic Segmentation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[7]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Vladlen Koltun,et al.  Playing for Data: Ground Truth from Computer Games , 2016, ECCV.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Chi-Keung Tang,et al.  Attribute-Guided Face Generation Using Conditional CycleGAN , 2017, ECCV.

[11]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Jung-Woo Ha,et al.  StarGAN: Unified Generative Adversarial Networks for Multi-domain Image-to-Image Translation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[13]  Ming-Hsuan Yang,et al.  Universal Style Transfer via Feature Transforms , 2017, NIPS.

[14]  Antonio M. López,et al.  The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Kiyoharu Aizawa,et al.  Cross-Domain Weakly-Supervised Object Detection Through Progressive Domain Adaptation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Chong-Wah Ngo,et al.  Semi-supervised Domain Adaptation with Subspace Learning for visual recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[18]  Abhinav Gupta,et al.  A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[20]  Pierre Alliez,et al.  Convolutional Neural Networks for Large-Scale Remote-Sensing Image Classification , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[21]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Fan Zhang,et al.  TreeUNet: Adaptive Tree convolutional neural networks for subdecimeter aerial image segmentation , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[23]  Siyuan Liu,et al.  Unsupervised Image Super-Resolution Using Cycle-in-Cycle Generative Adversarial Networks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[24]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[25]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.

[26]  Jieping Ye,et al.  Object Detection in 20 Years: A Survey , 2019, Proceedings of the IEEE.

[27]  Yongqiang Zhang,et al.  SOD-MTGAN: Small Object Detection via Multi-Task Generative Adversarial Network , 2018, ECCV.

[28]  Dong Liu,et al.  Fully Convolutional Adaptation Networks for Semantic Segmentation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[31]  Iasonas Kokkinos,et al.  Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs , 2014, ICLR.

[32]  Jiancheng Luo,et al.  DE-Net: Deep Encoding Network for Building Extraction from High-Resolution Remote Sensing Imagery , 2019, Remote. Sens..

[33]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[34]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[35]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[36]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Gang Hua,et al.  Visual attribute transfer through deep image analogy , 2017, ACM Trans. Graph..

[38]  Qi Wu,et al.  Image Captioning and Visual Question Answering Based on Attributes and External Knowledge , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[40]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[41]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Adam Finkelstein,et al.  PairedCycleGAN: Asymmetric Style Transfer for Applying and Removing Makeup , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[43]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Jefersson Alex dos Santos,et al.  Do deep features generalize from everyday objects to remote sensing and aerial scenes domains? , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[45]  Yunchao Wei,et al.  Perceptual Generative Adversarial Networks for Small Object Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Zhenwei Shi,et al.  Random Access Memories: A New Paradigm for Target Detection in High Resolution Aerial Remote Sensing Images , 2018, IEEE Transactions on Image Processing.

[47]  Geoffrey E. Hinton,et al.  Machine Learning for Aerial Image Labeling , 2013 .

[48]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[49]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[50]  Rama Chellappa,et al.  Visual Domain Adaptation: A survey of recent advances , 2015, IEEE Signal Processing Magazine.

[51]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[53]  Leon A. Gatys,et al.  Image Style Transfer Using Convolutional Neural Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[54]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[55]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[56]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[58]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[59]  Bo Du,et al.  Deep Learning for Remote Sensing Data: A Technical Tutorial on the State of the Art , 2016, IEEE Geoscience and Remote Sensing Magazine.

[60]  Meng Lu,et al.  Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set , 2019, IEEE Transactions on Geoscience and Remote Sensing.

[61]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[63]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[64]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[65]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[67]  Zhenwei Shi,et al.  Can a Machine Generate Humanlike Language Descriptions for a Remote Sensing Image? , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[68]  Dongyang Li,et al.  Semantic Segmentation of Remote Sensing Images Using Multiscale Decoding Network , 2019, IEEE Geoscience and Remote Sensing Letters.

[69]  Ming Lu,et al.  Decoder Network over Lightweight Reconstructed Feature for Fast Semantic Style Transfer , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[70]  Simon Hessner,et al.  Image Style Transfer using Convolutional Neural Networks , 2018 .