A novel framework for semantic segmentation with generative adversarial network

Abstract Semantic segmentation plays an important role in a series of high-level computer vision applications. In the state-of-the-art semantic segmentation methods based on fully convolutional neural networks, all label variables are predicted independently from each other, and the restricted field-of-views of the convolutional filters are difficult to capture the long-range information. In this paper, a novel post-processing method based on GAN (Generative Adversarial Network) is explored to reinforce spatial contiguity in the output label maps. With the help of fully connected layers in the discriminator, the GAN can capture the long-range information, and provide an auxiliary higher-order potential loss to the segmentation model, thus the segmentation model has the ability of correcting higher order inconsistencies. Furthermore, the optimization scheme in Wasserstein GAN (WGAN) is adopted to the training process of our model to get better performance and stability. Extensive experiments on public benchmarking database demonstrate the effectiveness of the proposed method.

[1]  Peter V. Gehler,et al.  Efficient Facade Segmentation Using Auto-context , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[2]  Zoe L. Jiang,et al.  Scalable graph based non-negative multi-view embedding for image ranking , 2018, Neurocomputing.

[3]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[4]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[5]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Luming Zhang,et al.  HyperSSR: A hypergraph based semi-supervised ranking method for visual search reranking , 2018, Neurocomputing.

[7]  Xuelong Li,et al.  Modeling Disease Progression via Multisource Multitask Learners: A Case Study With Alzheimer’s Disease , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Stephen Gould,et al.  Decomposing a scene into geometric and semantically consistent regions , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Gang Wang,et al.  Learning Common and Specific Features for RGB-D Semantic Segmentation with Deconvolutional Networks , 2016, ECCV.

[11]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Jun Ma,et al.  NeuroStylist: Neural Compatibility Modeling for Clothing Matching , 2017, ACM Multimedia.

[13]  Qi Tian,et al.  Enhancing Micro-video Understanding by Harnessing External Sounds , 2017, ACM Multimedia.

[14]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.