Tackling mode collapse in multi-generator GANs with orthogonal vectors

Abstract Generative Adversarial Networks (GANs) have been widely used to generate realistic-looking instances. However, training robust GAN is a non-trivial task due to the problem of mode collapse. Although many GAN variants are proposed to overcome this problem, they have limitations. Those existing studies either generate identical instances or result in negative gradients during training. In this paper, we propose a new approach to training GAN to overcome mode collapse by employing a set of generators, an encoder and a discriminator. A new minimax formula is proposed to simultaneously train all components in a similar spirit to vanilla GAN. The orthogonal vector strategy is employed to guide multiple generators to learn different information in a complementary manner. In this way, we term our approach Multi-Generator Orthogonal GAN (MGO-GAN). Specifically, the synthetic data produced by those generators are fed into the encoder to obtain feature vectors. The orthogonal value is calculated between any two feature vectors, which loyally reflects the correlation between vectors. Such a correlation indicates how different information has been learnt by generators. The lower the orthogonal value is, the more different information the generators learn. We minimize the orthogonal value along with minimizing the generator loss through back-propagation in the training of GAN. The orthogonal value is integrated with the original generator loss to jointly update the corresponding generator’s parameters. We conduct extensive experiments utilizing MNIST, CIFAR10 and CelebA datasets to demonstrate the significant performance improvement of MGO-GAN in terms of generated data quality and diversity at different resolutions.

[1]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[2]  Bernhard Schölkopf,et al.  AdaGAN: Boosting Generative Models , 2017, NIPS.

[3]  Victor Eijkhout,et al.  The Role of the Strengthened Cauchy-Buniakowskii-Schwarz Inequality in Multilevel Methods , 1991, SIAM Rev..

[4]  B. Green The orthogonal approximation of an oblique structure in factor analysis , 1952 .

[5]  Bruno Sericola,et al.  MD-GAN: Multi-Discriminator Generative Adversarial Networks for Distributed Datasets , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[6]  Pedro W. Lamberti,et al.  Non-logarithmic Jensen–Shannon divergence , 2003 .

[7]  Rui Zhang,et al.  Generative Adversarial Classifier for Handwriting Characters Super-Resolution , 2020, Pattern Recognit..

[8]  Philip H. S. Torr,et al.  Multi-agent Diverse Generative Adversarial Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  M. Sion On general minimax theorems , 1958 .

[10]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[11]  Senzhang Wang,et al.  Sketch-then-Edit Generative Adversarial Network , 2020, Knowl. Based Syst..

[12]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[13]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[14]  Yang Yu,et al.  Unsupervised Representation Learning with Deep Convolutional Neural Network for Remote Sensing Images , 2017, ICIG.

[15]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[16]  Flemming Topsøe,et al.  Jensen-Shannon divergence and Hilbert space embedding , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[17]  Aleksander Madry,et al.  How Does Batch Normalization Help Optimization? (No, It Is Not About Internal Covariate Shift) , 2018, NeurIPS.

[18]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[19]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[20]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[21]  Abhishek Kumar,et al.  Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference , 2017, NIPS.

[22]  Qin Li,et al.  Orthogonal discriminant vector for face recognition across pose , 2012, Pattern Recognit..

[23]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[24]  Trung Le,et al.  MGAN: Training Generative Adversarial Nets with Multiple Generators , 2018, ICLR.

[25]  Bernhard Pfahringer,et al.  Regularisation of neural networks by enforcing Lipschitz continuity , 2018, Machine Learning.

[26]  Haifeng Hu,et al.  Generative attention adversarial classification network for unsupervised domain adaptation , 2020, Pattern Recognit..

[27]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[28]  Ping Chen,et al.  His-GAN: A histogram-based GAN model to improve data generation quality , 2019, Neural Networks.

[29]  Hao Hu,et al.  Global Versus Localized Generative Adversarial Nets , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Che-Rung Lee,et al.  Escaping from Collapsing Modes in a Constrained Space , 2018, ECCV.

[32]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[33]  José Miguel Hernández-Lobato,et al.  Combining deep generative and discriminative models for Bayesian semi-supervised learning , 2020, Pattern Recognit..

[34]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[35]  Ludger Rüschendorf,et al.  The Wasserstein distance and approximation theorems , 1985 .