On the Convergence of Gradient Descent in GANs: MMD GAN As a Gradient Flow

We consider the maximum mean discrepancy ($\mathrm{MMD}$) GAN problem and propose a parametric kernelized gradient flow that mimics the min-max game in gradient regularized $\mathrm{MMD}$ GAN. We show that this flow provides a descent direction minimizing the $\mathrm{MMD}$ on a statistical manifold of probability distributions. We then derive an explicit condition which ensures that gradient descent on the parameter space of the generator in gradient regularized $\mathrm{MMD}$ GAN is globally convergent to the target distribution. Under this condition, we give non asymptotic convergence results of gradient descent in MMD GAN. Another contribution of this paper is the introduction of a dynamic formulation of a regularization of $\mathrm{MMD}$ and demonstrating that the parametric kernelized descent for $\mathrm{MMD}$ is the gradient flow of this functional with respect to the new Riemannian structure. Our obtained theoretical result allows ones to treat gradient flows for quite general functionals and thus has potential applications to other types of variational inferences on a statistical manifold beyond GANs. Finally, numerical experiments suggest that our parametric kernelized gradient flow stabilizes GAN training and guarantees convergence.

[1]  A. Duncan,et al.  On the geometry of Stein variational gradient descent , 2019, ArXiv.

[2]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[3]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[4]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[5]  Cícero Nogueira dos Santos,et al.  Learning Implicit Generative Models by Matching Perceptual Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[7]  F. Santambrogio {Euclidean, metric, and Wasserstein} gradient flows: an overview , 2016, 1609.03890.

[8]  Francis Bach,et al.  On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport , 2018, NeurIPS.

[9]  Y. Yao,et al.  On Early Stopping in Gradient Descent Learning , 2007 .

[10]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[11]  Francis R. Bach,et al.  Breaking the Curse of Dimensionality with Convex Neural Networks , 2014, J. Mach. Learn. Res..

[12]  Arthur Gretton,et al.  Kernelized Wasserstein Natural Gradient , 2020, ICLR.

[13]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[14]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[15]  Lawrence K. Saul,et al.  Kernel Methods for Deep Learning , 2009, NIPS.

[16]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[17]  Wuchen Li,et al.  Optimal transport natural gradient for statistical manifolds with continuous sample space , 2018 .

[18]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[19]  Yann Brenier,et al.  A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem , 2000, Numerische Mathematik.

[20]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[21]  Arthur Gretton,et al.  Maximum Mean Discrepancy Gradient Flow , 2019, NeurIPS.

[22]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[23]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[24]  Arthur Gretton,et al.  On gradient regularizers for MMD GANs , 2018, NeurIPS.

[25]  David Lopez-Paz,et al.  Geometrical Insights for Implicit Generative Modeling , 2017, Braverman Readings in Machine Learning.

[26]  Tengyuan Liang,et al.  On How Well Generative Adversarial Networks Learn Densities: Nonparametric and Parametric Results , 2018, ArXiv.

[27]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[28]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[29]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[30]  Joan Bruna,et al.  A mean-field analysis of two-player zero-sum games , 2020, NeurIPS.

[31]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[32]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[33]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[34]  Vaibhava Goel,et al.  McGan: Mean and Covariance Feature Matching GAN , 2017, ICML.

[35]  Tom Sercu,et al.  Sobolev Descent , 2018, AISTATS.

[36]  Alessandro Barp,et al.  Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[37]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.