PolicyGAN: Training generative adversarial networks using policy gradient

This paper presents PolicyGAN, a policy gradient paradigm for training Generative Adversarial Networks that views the generator as an image generation neural agent which is rewarded by another neural agent, termed as the discriminator. Rewards are higher for samples near the original data manifold. In PolicyGAN, only reward signal from the output of the discriminator is used for updating the generator network using policy gradient. This obviates the need for gradient signal to flow through the discriminator for training the generator; an intrinsic property of original GAN formulation. Given the inherent difficulty of training adversarial models, and low convergence speed of policy gradient, training GANs using policy gradient is a non-trivial problem and requires deep study. Currently GANs have used only differentiable discriminators for training. Policy-GAN germinates the possibility of using a wide variety of non-differentiable discriminator networks for training GANs, something which was not possible with the original GAN framework. Another advantage of using policy gradient is that now the generator need not produce deterministic samples, but can generate a probability distribution from which samples can be taken. PolicyGAN thus paves the path to use a variety of probabilistic models.

[1]  Mykel J. Kochenderfer,et al.  Imitating driver behavior with generative adversarial networks , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[2]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[3]  Pieter Abbeel,et al.  InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets , 2016, NIPS.

[4]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[5]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[6]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Alan Ritter,et al.  Adversarial Learning for Neural Dialogue Generation , 2017, EMNLP.

[9]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[10]  Léon Bottou,et al.  Towards Principled Methods for Training Generative Adversarial Networks , 2017, ICLR.

[11]  Hui Jiang,et al.  Generating images with recurrent adversarial networks , 2016, ArXiv.

[12]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[13]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[14]  David Pfau,et al.  Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[15]  Andrew Zisserman,et al.  Deep Fisher Networks for Large-Scale Image Classification , 2013, NIPS.

[16]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[17]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[18]  Yann LeCun,et al.  Deep multi-scale video prediction beyond mean square error , 2015, ICLR.

[19]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[20]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[21]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[22]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[23]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.