On the Convergence and Robustness of Training GANs with Regularized Optimal Transport

Generative Adversarial Networks (GANs) are one of the most practical methods for learning data distributions. A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions. Unfortunately, minimizing the Wasserstein distance between the data distribution and the generative model distribution is a computationally challenging problem as its objective is non-convex, non-smooth, and even hard to compute. In this work, we show that obtaining gradient information of the smoothed Wasserstein GAN formulation, which is based on regularized Optimal Transport (OT), is computationally effortless and hence one can apply first order optimization methods to minimize this objective. Consequently, we establish theoretical convergence guarantee to stationarity for a proposed class of GAN optimization algorithms. Unlike the original non-smooth formulation, our algorithm only requires solving the discriminator to approximate optimality. We apply our method to learning MNIST digits as well as CIFAR-10 images. Our experiments show that our method is computationally efficient and generates images comparable to the state of the art algorithms given the same architecture and computational power.

[1]  Edward C. Posner,et al.  Random coding strategies for minimum entropy , 1975, IEEE Trans. Inf. Theory.

[2]  P. Bernhard,et al.  On a theorem of Danskin with an application to a theorem of Von Neumann-Sion , 1995 .

[3]  V. Borkar Stochastic approximation with two time scales , 1997 .

[4]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[5]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[6]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[7]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[8]  C. Villani Optimal Transport: Old and New , 2008 .

[9]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[10]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[11]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[14]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[15]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[16]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[17]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[18]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[19]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[20]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[21]  Luisa Gargano,et al.  How to find a joint probability distribution of minimum entropy (almost) given the marginals , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[22]  O. Bousquet,et al.  From optimal transport to generative modeling: the VEGAN cookbook , 2017, 1705.07642.

[23]  Gabriel Peyré,et al.  Convergence of Entropic Schemes for Optimal Transport and Gradient Flows , 2015, SIAM J. Math. Anal..

[24]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[25]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[26]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[27]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[28]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[29]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[30]  Jae Hyun Lim,et al.  Geometric GAN , 2017, ArXiv.

[31]  Fei Xia,et al.  Understanding GANs: the LQG Setting , 2017, ArXiv.

[32]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[33]  Jerry Li,et al.  Towards Understanding the Dynamics of Generative Adversarial Networks , 2017, ArXiv.

[34]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[35]  Nicolas Courty,et al.  Large Scale Optimal Transport and Mapping Estimation , 2017, ICLR.

[36]  David Tse,et al.  A Convex Duality Framework for GANs , 2018, NeurIPS.

[37]  Lars M. Mescheder,et al.  On the convergence properties of GAN training , 2018, ArXiv.

[38]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[39]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[40]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[41]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[42]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[43]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.