Convergence and Sample Complexity of SGD in GANs

We provide theoretical convergence guarantees on training Generative Adversarial Networks (GANs) via SGD. We consider learning a target distribution modeled by a 1-layer Generator network with a non-linear activation function $\phi(\cdot)$ parametrized by a $d \times d$ weight matrix $\mathbf W_*$, i.e., $f_*(\mathbf x) = \phi(\mathbf W_* \mathbf x)$. Our main result is that by training the Generator together with a Discriminator according to the Stochastic Gradient Descent-Ascent iteration proposed by Goodfellow et al. yields a Generator distribution that approaches the target distribution of $f_*$. Specifically, we can learn the target distribution within total-variation distance $\epsilon$ using $\tilde O(d^2/\epsilon^2)$ samples which is (near-)information theoretically optimal. Our results apply to a broad class of non-linear activation functions $\phi$, including ReLUs and is enabled by a connection with truncated statistics and an appropriate design of the Discriminator network. Our approach relies on a bilevel optimization framework to show that vanilla SGDA works.

[1]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[2]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[3]  T. Sanders,et al.  Analysis of Boolean Functions , 2012, ArXiv.

[4]  Ryan O'Donnell,et al.  Analysis of Boolean Functions , 2014, ArXiv.

[5]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[6]  Mingrui Liu,et al.  Non-Convex Min-Max Optimization: Provable Algorithms and Applications in Machine Learning , 2018, ArXiv.

[7]  Christos Tzamos,et al.  Efficient Statistics, in High Dimensions, from Truncated Samples , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[8]  Constantinos Daskalakis,et al.  Last-Iterate Convergence: Zero-Sum Games and Constrained Min-Max Optimization , 2018, ITCS.

[9]  Lin Yang,et al.  Photographic Text-to-Image Synthesis with a Hierarchically-Nested Adversarial Network , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[10]  Georgios Piliouras,et al.  Poincaré Recurrence, Cycles and Spurious Equilibria in Gradient-Descent-Ascent for Non-Convex Non-Concave Zero-Sum Games , 2019, NeurIPS.

[11]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[12]  Sridhar Mahadevan,et al.  Global Convergence to the Equilibrium of GANs using Variational Inequalities , 2018, ArXiv.

[13]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[14]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[15]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[16]  A. Carbery,et al.  Distributional and L-q norm inequalities for polynomials over convex bodies in R-n , 2001 .

[17]  Hao Wang,et al.  Unsupervised Graph Representation Learning With Variable Heat Kernel , 2020, IEEE Access.

[18]  Nicholas J. A. Harvey,et al.  Simple and optimal high-probability bounds for strongly-convex stochastic gradient descent , 2019, ArXiv.

[19]  Kamalika Chaudhuri,et al.  Approximation and Convergence Properties of Generative Adversarial Learning , 2017, NIPS.

[20]  Nicholas J. A. Harvey,et al.  Tight Analyses for Non-Smooth Stochastic Gradient Descent , 2018, COLT.

[21]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[22]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[23]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[24]  Fei Xia,et al.  Understanding GANs: the LQG Setting , 2017, ArXiv.

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[27]  J. Danskin The Theory of Max-Min and its Application to Weapons Allocation Problems , 1967 .

[28]  Jacob D. Abernethy,et al.  How to Train Your DRAGAN , 2017, ArXiv.

[29]  Alexandros G. Dimakis,et al.  SGD Learns One-Layer Networks in WGANs , 2019, ICML.