Solving Approximate Wasserstein GANs to Stationarity

Generative Adversarial Networks (GANs) are one of the most practical strategies to learn data distributions. A popular GAN formulation is based on the use of Wasserstein distance as a metric between probability distributions. Unfortunately, minimizing the Wasserstein distance between the data distribution and the generative model distribution is a challenging problem as its objective is non-convex, non-smooth, and even hard to compute. In this work, we propose to use a smooth approximation of the Wasserstein GANs. We show that this smooth approximation is close to the original objective. Moreover, obtaining gradient information of this approximate formulation is computationally effortless and hence one can easily apply first order optimization methods to optimize this objective. Based on this observation, we proposed a class of algorithms with guaranteed theoretical convergence to stationarity. Unlike the original non-smooth objective, our proposed algorithm only requires solving the discriminator to approximate optimality. We applied our method to learning Gaussian mixtures on a grid and also to learning MNIST digits. Our method allows the use of powerful cost functions based on latent representations of the data, where this latent representation could also be optimized adversarially.

[1]  J. Danskin The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .

[2]  O. Bousquet,et al.  From optimal transport to generative modeling: the VEGAN cookbook , 2017, 1705.07642.

[3]  Han Zhang,et al.  Improving GANs Using Optimal Transport , 2018, ICLR.

[4]  Nicolas Papadakis,et al.  Overrelaxed Sinkhorn-Knopp Algorithm for Regularized Optimal Transport , 2017, Algorithms.

[5]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[6]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[7]  Lars M. Mescheder,et al.  On the convergence properties of GAN training , 2018, ArXiv.

[8]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[9]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[10]  Fei Xia,et al.  Understanding GANs: the LQG Setting , 2017, ArXiv.

[11]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[12]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[13]  O. SIAMJ.,et al.  PROX-METHOD WITH RATE OF CONVERGENCE O(1/t) FOR VARIATIONAL INEQUALITIES WITH LIPSCHITZ CONTINUOUS MONOTONE OPERATORS AND SMOOTH CONVEX-CONCAVE SADDLE POINT PROBLEMS∗ , 2004 .

[14]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[15]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Gabriel Peyré,et al.  Convergence of Entropic Schemes for Optimal Transport and Gradient Flows , 2015, SIAM J. Math. Anal..

[18]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[19]  Zeyuan Allen Zhu,et al.  Improved SVRG for Non-Strongly-Convex or Sum-of-Non-Convex Objectives , 2015, ICML.

[20]  Luisa Gargano,et al.  How to find a joint probability distribution of minimum entropy (almost) given the marginals , 2017, 2017 IEEE International Symposium on Information Theory (ISIT).

[21]  C. Villani Optimal Transport: Old and New , 2008 .

[22]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[23]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[24]  Jerry Li,et al.  Towards Understanding the Dynamics of Generative Adversarial Networks , 2017, ArXiv.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Nicolas Courty,et al.  Large Scale Optimal Transport and Mapping Estimation , 2017, ICLR.

[27]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[28]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[29]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[30]  V. Borkar Stochastic approximation with two time scales , 1997 .

[31]  Vivien Seguy,et al.  Smooth and Sparse Optimal Transport , 2017, AISTATS.

[32]  Gabriel Peyré,et al.  Learning Generative Models with Sinkhorn Divergences , 2017, AISTATS.

[33]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[34]  Jason Altschuler,et al.  Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration , 2017, NIPS.

[35]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[36]  Gabriel Peyré,et al.  Stochastic Optimization for Large-scale Optimal Transport , 2016, NIPS.

[37]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.