Catastrophic forgetting and mode collapse in GANs

In this paper, we show that Generative Adversarial Networks (GANs) suffer from catastrophic forgetting even when they are trained to approximate a single target distribution. We show that GAN training is a continual learning problem in which the sequence of changing model distributions is the sequence of tasks to the discriminator. The level of mismatch between tasks in the sequence determines the level of forgetting. Catastrophic forgetting is interrelated to mode collapse and can make the training of GANs non-convergent. We investigate the landscape of the discriminator’s output in different variants of GANs and find that when a GAN converges to a good equilibrium, real training datapoints are wide local maxima of the discriminator. We empirically show the relationship between the sharpness of local maxima and mode collapse and generalization in GANs. We show how catastrophic forgetting prevents the discriminator from making real datapoints local maxima, and thus causes non-convergence. Finally, we study methods for preventing catastrophic forgetting in GANs.

[1]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[2]  Luc Van Gool,et al.  Wasserstein Divergence for GANs , 2017, ECCV.

[3]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[4]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[5]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[7]  Yee Whye Teh,et al.  Progress & Compress: A scalable framework for continual learning , 2018, ICML.

[8]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[9]  Truyen Tran,et al.  Improving Generalization and Stability of Generative Adversarial Networks , 2019, ICLR.

[10]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[11]  J. Urgen Schmidhuber,et al.  Learning Factorial Codes by Predictability Minimization , 1992, Neural Computation.

[12]  OctoMiao Overcoming catastrophic forgetting in neural networks , 2016 .

[13]  J. Zico Kolter,et al.  Gradient descent GAN optimization is locally stable , 2017, NIPS.

[14]  Andrew M. Dai,et al.  Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence At Every Step , 2017, ICLR.

[15]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[16]  Yi Zhang,et al.  Do GANs learn the distribution? Some Theory and Empirics , 2018, ICLR.

[17]  Han Liu,et al.  Continual Learning in Generative Adversarial Nets , 2017, ArXiv.

[18]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[19]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[20]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[21]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[22]  Sebastian Nowozin,et al.  The Numerics of GANs , 2017, NIPS.

[23]  Tao Xu,et al.  On the Discrimination-Generalization Tradeoff in GANs , 2017, ICLR.

[24]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[27]  Jürgen Schmidhuber,et al.  Flat Minima , 1997, Neural Computation.

[28]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[29]  Guoyin Wang,et al.  Generative Adversarial Network Training is a Continual Learning Problem , 2018, ArXiv.