论文信息 - STDGAN: ResBlock Based Generative Adversarial Nets Using Spectral Normalization and Two Different Discriminators

STDGAN: ResBlock Based Generative Adversarial Nets Using Spectral Normalization and Two Different Discriminators

Generative adversarial network (GAN) is a powerful generative model. However, it suffers from two key problems, which are convergence and mode collapse. To overcome these drawbacks, this paper presents a novel architecture of GAN, called STDGAN, which consists of one generator and two different discriminators. With the fact that GAN is the analogy of a minimax game, the proposed architecture is as follows. The generator G aims to produce realistic-looking samples to fool both of two discriminators. The first discriminator D1 rewards high scores for the samples from the data distribution, while the second one D2 favors the samples from the generator conversely. Specifically, the minibatch discrimination and Spectral Normalization (SN) are first adopted in D1. Then, based on the ResBlock architecture, Spectral Normalization (SN) and Scaled Exponential Linear Units (SELU) are adopted in the first and last half layers of D2 respectively. In particular, a novel loss function is designed to optimize the STDGAN by minimizing the KL divergence. Extensive experiments on CIFAR-10/100 and ImageNet datasets demonstrate that the proposed STDGAN can effectively solve the problems of convergence and mode collapse and obtain the higher inception score (IS) and lower Frechet Inception Distance (FID) compared with other state-of-the-art GANs.

Jun Yu | Zhaoyu Zhang

[1] Sina Honari,et al. Distribution Matching Losses Can Hallucinate Features in Medical Image Translation , 2018, MICCAI.

[2] Yoshua Bengio,et al. Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[3] Sepp Hochreiter,et al. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[4] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5] Dinesh Manocha,et al. Dynamic Sound Field Synthesis for Speech and Music Optimization , 2018, ACM Multimedia.

[6] Ye Wang,et al. SLIONS: A Karaoke Application to Enhance Foreign Language Learning , 2018, ACM Multimedia.

[7] Geoffrey E. Hinton,et al. Dynamic Routing Between Capsules , 2017, NIPS.

[8] Sepp Hochreiter,et al. Self-Normalizing Neural Networks , 2017, NIPS.

[9] Aaron C. Courville,et al. Improved Training of Wasserstein GANs , 2017, NIPS.

[10] Trung Le,et al. Dual Discriminator Generative Adversarial Nets , 2017, NIPS.

[11] Changsheng Xu,et al. A Unified Generative Adversarial Framework for Image Generation and Person Re-identification , 2018, ACM Multimedia.