Towards a Better Global Loss Landscape of GANs

Understanding of GAN training is still very limited. One major challenge is its non-convex-non-concave min-max objective, which may lead to sub-optimal local minima. In this work, we perform a global landscape analysis of the empirical loss of GANs. We prove that a class of separable-GAN, including the original JS-GAN, has exponentially many bad basins which are perceived as mode-collapse. We also study the relativistic pairing GAN (RpGAN) loss which couples the generated samples and the true samples. We prove that RpGAN has no bad basins. Experiments on synthetic data show that the predicted bad basin can indeed appear in training. We also perform experiments to support our theory that RpGAN has a better landscape than separable-GAN. For instance, we empirically show that RpGAN performs better than separable-GAN with relatively narrow neural nets. The code is available at this https URL.

[1]  Yiannis Demiris,et al.  MAGAN: Margin Adaptation for Generative Adversarial Networks , 2017, ArXiv.

[2]  Fei Xia,et al.  Understanding GANs: the LQG Setting , 2017, ArXiv.

[3]  Constantinos Daskalakis,et al.  Training GANs with Optimism , 2017, ICLR.

[4]  Jason D. Lee,et al.  On the Convergence and Robustness of Training GANs with Regularized Optimal Transport , 2018, NeurIPS.

[5]  Constantinos Daskalakis,et al.  The Limit Points of (Optimistic) Gradient Descent in Min-Max Optimization , 2018, NeurIPS.

[6]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[7]  Yu Bai,et al.  Approximability of Discriminators Implies Diversity in GANs , 2018, ICLR.

[8]  Arthur Jacot,et al.  Neural tangent kernel: convergence and generalization in neural networks (invited paper) , 2018, NeurIPS.

[9]  Alexandros G. Dimakis,et al.  SGD Learns One-Layer Networks in WGANs , 2019, ICML.

[10]  Jaehoon Lee,et al.  Wide neural networks of any depth evolve as linear models under gradient descent , 2019, NeurIPS.

[11]  Dawei Li,et al.  Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations , 2018, ArXiv.

[12]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[13]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[14]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[15]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[16]  Dawei Li,et al.  Sub-Optimal Local Minima Exist for Neural Networks with Almost All Non-Linear Activations , 2019 .

[17]  David Tse,et al.  A Convex Duality Framework for GANs , 2018, NeurIPS.

[18]  Pascal Vincent,et al.  A Closer Look at the Optimization Landscapes of Generative Adversarial Networks , 2019, ICLR.

[19]  Peter W. Glynn,et al.  Probability Functional Descent: A Unifying Perspective on GANs, Variational Inference, and Reinforcement Learning , 2019, ICML.

[20]  Yuanzhi Li,et al.  A Convergence Theory for Deep Learning via Over-Parameterization , 2018, ICML.

[21]  A. Bovier,et al.  Metastability in Reversible Diffusion Processes I: Sharp Asymptotics for Capacities and Exit Times , 2004 .

[22]  Stefan Winkler,et al.  The Unusual Effectiveness of Averaging in GAN Training , 2018, ICLR.

[23]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[24]  Kamalika Chaudhuri,et al.  The Inductive Bias of Restricted f-GANs , 2018, ArXiv.

[25]  Yuanzhi Li,et al.  Convergence Analysis of Two-layer Neural Networks with ReLU Activation , 2017, NIPS.

[26]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[27]  Dawei Li,et al.  On the Benefit of Width for Neural Networks: Disappearance of Basins , 2018, SIAM J. Optim..

[28]  Jonas Adler,et al.  Banach Wasserstein GAN , 2018, NeurIPS.

[29]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[30]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[31]  Roi Livni,et al.  On the Computational Efficiency of Training Neural Networks , 2014, NIPS.

[32]  R. Srikant,et al.  Understanding the Loss Surface of Neural Networks for Binary Classification , 2018, ICML.

[33]  Levent Sagun,et al.  The jamming transition as a paradigm to understand the loss landscape of deep neural networks , 2018, Physical review. E.

[34]  Liwei Wang,et al.  Gradient Descent Finds Global Minima of Deep Neural Networks , 2018, ICML.

[35]  Yu Qiao,et al.  ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks , 2018, ECCV Workshops.

[36]  Sepp Hochreiter,et al.  Coulomb GANs: Provably Optimal Nash Equilibria via Potential Fields , 2017, ICLR.

[37]  Matthias Hein,et al.  On the loss landscape of a class of deep neural networks with no bad local valleys , 2018, ICLR.

[38]  Luc Van Gool,et al.  Sliced Wasserstein Generative Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[40]  Marco Gori,et al.  Optimal learning in artificial neural networks: A review of theoretical results , 1996, Neurocomputing.

[41]  Sewoong Oh,et al.  Optimal transport mapping via input convex neural networks , 2019, ICML.

[42]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[43]  Yuan Cao,et al.  Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks , 2018, ArXiv.

[44]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[45]  S. Shankar Sastry,et al.  On Finding Local Nash Equilibria (and Only Local Nash Equilibria) in Zero-Sum Games , 2019, 1901.00838.

[46]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[47]  Timo Aila,et al.  A Style-Based Generator Architecture for Generative Adversarial Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  R. Srikant,et al.  Revisiting Landscape Analysis in Deep Neural Networks: Eliminating Decreasing Paths to Infinity , 2019, SIAM J. Optim..

[49]  Ruo-Yu Sun,et al.  Optimization for Deep Learning: An Overview , 2020, Journal of the Operations Research Society of China.

[50]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[51]  Vaibhava Goel,et al.  McGan: Mean and Covariance Feature Matching GAN , 2017, ICML.

[52]  Jerry Li,et al.  On the Limitations of First-Order Approximation in GAN Dynamics , 2017, ICML.

[53]  Joan Bruna,et al.  Spurious Valleys in Two-layer Neural Network Optimization Landscapes , 2018, 1802.06384.

[54]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[56]  David Berthelot,et al.  Creating High Resolution Images with a Latent Adversarial Generator , 2020, ArXiv.

[57]  Jitendra Malik,et al.  Implicit Maximum Likelihood Estimation , 2018, ArXiv.

[58]  Jeanine Warisse Turner,et al.  Real or Not Real , 2020 .

[59]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[60]  Dustin Tran,et al.  Deep and Hierarchical Implicit Models , 2017, ArXiv.

[61]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[62]  Jascha Sohl-Dickstein,et al.  Improved generator objectives for GANs , 2016, ArXiv.

[63]  R. Srikant,et al.  Adding One Neuron Can Eliminate All Bad Local Minima , 2018, NeurIPS.

[64]  Ioannis Mitliagkas,et al.  A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games , 2019, ArXiv.

[65]  Dawei Li,et al.  Sub-Optimal Local Minima Exist for Almost All Over-parameterized Neural Networks , 2019, ArXiv.

[66]  Shakir Mohamed,et al.  Learning in Implicit Generative Models , 2016, ArXiv.

[67]  Alexia Jolicoeur-Martineau,et al.  On Relativistic f-Divergences , 2019, ICML.

[68]  Thore Graepel,et al.  The Mechanics of n-Player Differentiable Games , 2018, ICML.

[69]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[70]  Asuman Ozdaglar,et al.  GANs May Have No Nash Equilibria , 2020, ICML 2020.

[71]  Olivier Bachem,et al.  Assessing Generative Models via Precision and Recall , 2018, NeurIPS.

[72]  Sanjiv Kumar,et al.  On the Convergence of Adam and Beyond , 2018 .

[73]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[74]  Amir Globerson,et al.  Globally Optimal Gradient Descent for a ConvNet with Gaussian Inputs , 2017, ICML.

[75]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[76]  Joan Bruna,et al.  Neural Networks with Finite Intrinsic Dimension have no Spurious Valleys , 2018, ArXiv.

[77]  Gauthier Gidel,et al.  A Variational Inequality Perspective on Generative Adversarial Networks , 2018, ICLR.

[78]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[79]  Jacob Abernethy,et al.  On Convergence and Stability of GANs , 2018 .

[80]  David A. Forsyth,et al.  Max-Sliced Wasserstein Distance and Its Use for GANs , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[81]  Richard S. Zemel,et al.  Dualing GANs , 2017, NIPS.

[82]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[83]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[84]  Jerry Li,et al.  Towards Understanding the Dynamics of Generative Adversarial Networks , 2017, ArXiv.

[85]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[86]  Tong Zhang,et al.  A Framework of Composite Functional Gradient Methods for Generative Adversarial Models , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[87]  Ashish Khetan,et al.  PacGAN: The Power of Two Samples in Generative Adversarial Networks , 2017, IEEE Journal on Selected Areas in Information Theory.

[88]  Michael I. Jordan,et al.  Minmax Optimization: Stable Limit Points of Gradient Descent Ascent are Locally Optimal , 2019, ArXiv.

[89]  Ioannis Mitliagkas,et al.  Negative Momentum for Improved Game Dynamics , 2018, AISTATS.

[90]  Matus Telgarsky,et al.  Spectrally-normalized margin bounds for neural networks , 2017, NIPS.

[91]  Dawei Li,et al.  The Global Landscape of Neural Networks: An Overview , 2020, IEEE Signal Processing Magazine.

[92]  Tom Sercu,et al.  Fisher GAN , 2017, NIPS.

[93]  Bo Dai,et al.  Real or Not Real, that is the Question , 2020, ICLR.

[94]  A. Bovier,et al.  Metastability in reversible diffusion processes II. Precise asymptotics for small eigenvalues , 2005 .

[95]  Alexander G. Schwing,et al.  Generative Modeling Using the Sliced Wasserstein Distance , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[96]  Michael I. Jordan,et al.  What is Local Optimality in Nonconvex-Nonconcave Minimax Optimization? , 2019, ICML.

[97]  Matthias Hein,et al.  The Loss Surface of Deep and Wide Neural Networks , 2017, ICML.

[98]  Alexia Jolicoeur-Martineau,et al.  The relativistic discriminator: a key element missing from standard GAN , 2018, ICLR.

[99]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[100]  Jaakko Lehtinen,et al.  Analyzing and Improving the Image Quality of StyleGAN , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[101]  Gustavo K. Rohde,et al.  Sliced-Wasserstein Autoencoder: An Embarrassingly Simple Generative Model , 2018, ArXiv.

[102]  Ioannis Mitliagkas,et al.  A Tight and Unified Analysis of Gradient-Based Methods for a Whole Spectrum of Games , 2019 .

[103]  Zhi-Quan Luo,et al.  A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems , 2020, NeurIPS.

[104]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.