Implicit competitive regularization in GANs

To improve the stability of GAN training we need to understand why they can produce realistic samples. Presently, this is attributed to properties of the divergence obtained under an optimal discriminator. This argument has a fundamental flaw: If we do not impose regularity of the discriminator, it can exploit visually imperceptible errors of the generator to always achieve the maximal generator loss. In practice, gradient penalties are used to regularize the discriminator. However, this needs a metric on the space of images that captures visual similarity. Such a metric is not known, which explains the limited success of gradient penalties in stabilizing GANs. We argue that the performance of GANs is instead due to the implicit competitive regularization (ICR) arising from the simultaneous optimization of generator and discriminator. ICR promotes solutions that look real to the discriminator and thus leverages its inductive biases to generate realistic images. We show that opponent-aware modelling of generator and discriminator, as present in competitive gradient descent (CGD), can significantly strengthen ICR and thus stabilize GAN training without explicit regularization. In our experiments, we use an existing implementation of WGAN-GP and show that by training it with CGD we can improve the inception score (IS) on CIFAR10 for a wide range of scenarios, without any hyperparameter tuning. The highest IS is obtained by combining CGD with the WGAN-loss, without any explicit regularization.

[1]  Jacob Abernethy,et al.  On Convergence and Stability of GANs , 2018 .

[2]  Behnam Neyshabur,et al.  Implicit Regularization in Deep Learning , 2017, ArXiv.

[3]  Sridhar Mahadevan,et al.  Global Convergence to the Equilibrium of GANs using Variational Inequalities , 2018, ArXiv.

[4]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[5]  Masayoshi Kubo,et al.  Implicit Regularization in Over-parameterized Neural Networks , 2019, ArXiv.

[6]  Alain Trouvé,et al.  Foundations of Computational Mathematics Metamorphoses through Lie Group Action , 2005 .

[7]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[8]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[9]  Satrajit Chatterjee,et al.  Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization , 2020, ICLR.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Shimon Whiteson,et al.  Learning with Opponent-Learning Awareness , 2017, AAMAS.

[12]  S. Eisenstat Efficient Implementation of a Class of Preconditioned Conjugate Gradient Methods , 1981 .

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[15]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[16]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[17]  Babak Hassibi,et al.  Stochastic Mirror Descent on Overparameterized Nonlinear Models , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[18]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[19]  Asuman Ozdaglar,et al.  GANs May Have No Nash Equilibria , 2020, ICML 2020.

[20]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[21]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[22]  Jian Peng,et al.  A Gradual, Semi-Discrete Approach to Generative Network Training via Explicit Wasserstein Minimization , 2019, ICML.

[23]  Florian Schäfer,et al.  Competitive Gradient Descent , 2019, NeurIPS.

[24]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[25]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[26]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[27]  Hans Burkhardt,et al.  Invariant kernel functions for pattern analysis and machine learning , 2007, Machine Learning.

[28]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[29]  Lirong Dai,et al.  Local Coding Based Matching Kernel Method for Image Classification , 2014, PloS one.

[30]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[31]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  Nathan Srebro,et al.  Implicit Regularization in Matrix Factorization , 2017, 2018 Information Theory and Applications Workshop (ITA).

[33]  Thore Graepel,et al.  Differentiable Game Mechanics , 2019, J. Mach. Learn. Res..

[34]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[35]  Kenji Fukumizu,et al.  On integral probability metrics, φ-divergences and binary classification , 2009, 0901.2698.

[36]  Jonas Adler,et al.  Banach Wasserstein GAN , 2018, NeurIPS.

[37]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[38]  Benjamin Berkels,et al.  Time Discrete Geodesic Paths in the Space of Images , 2015, SIAM J. Imaging Sci..

[39]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  Ali Razavi,et al.  Generating Diverse High-Fidelity Images with VQ-VAE-2 , 2019, NeurIPS.

[42]  Mario Lucic,et al.  Are GANs Created Equal? A Large-Scale Study , 2017, NeurIPS.

[43]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[44]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[45]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[46]  Yu Cheng,et al.  Sobolev GAN , 2017, ICLR.

[47]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[48]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[49]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[50]  Sanjeev Arora,et al.  Implicit Regularization in Deep Matrix Factorization , 2019, NeurIPS.

[51]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[52]  Florian Schäfer,et al.  Image Extrapolation for the Time Discrete Metamorphosis Model: Existence and Applications , 2017, SIAM J. Imaging Sci..

[53]  V. Tikhomirov On the Representation of Continuous Functions of Several Variables as Superpositions of Continuous Functions of a Smaller Number of Variables , 1991 .

[54]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[55]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Yuxin Chen,et al.  Implicit Regularization in Nonconvex Statistical Estimation: Gradient Descent Converges Linearly for Phase Retrieval, Matrix Completion, and Blind Deconvolution , 2017, Found. Comput. Math..

[57]  Gauthier Gidel,et al.  Parametric Adversarial Divergences are Good Task Losses for Generative Modeling , 2017, ICLR.

[58]  Lillian J. Ratliff,et al.  Convergence of Learning Dynamics in Stackelberg Games , 2019, ArXiv.