Stabilizing Adversarial Nets With Prediction Methods

Adversarial neural networks solve many important problems in data science, but are notoriously difficult to train. These difficulties come from the fact that optimal weights for adversarial nets correspond to saddle points, and not minimizers, of the loss function. The alternating stochastic gradient methods typically used for such problems do not reliably converge to saddle points, and when convergence does happen it is often highly sensitive to learning rates. We propose a simple modification of stochastic gradient descent that stabilizes adversarial networks. We show, both in theory and practice, that the proposed method reliably converges to saddle points, and is stable with a wider range of training parameters than a non-prediction method. This makes adversarial networks less likely to "collapse," and enables faster training with larger learning rates.

[1]  Max Welling,et al.  The Variational Fair Autoencoder , 2015, ICLR.

[2]  Trevor Darrell,et al.  Adapting Visual Category Models to New Domains , 2010, ECCV.

[3]  Jonathon Shlens,et al.  Conditional Image Synthesis with Auxiliary Classifier GANs , 2016, ICML.

[4]  Guanghui Lan,et al.  Randomized First-Order Methods for Saddle Point Optimization , 2014, 1409.8625.

[5]  Simon Osindero,et al.  Conditional Generative Adversarial Nets , 2014, ArXiv.

[6]  Victor S. Lempitsky,et al.  Unsupervised Domain Adaptation by Backpropagation , 2014, ICML.

[7]  Stefano Ermon,et al.  Model-Free Imitation Learning with Policy Optimization , 2016, ICML.

[8]  Antonin Chambolle,et al.  A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging , 2011, Journal of Mathematical Imaging and Vision.

[9]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[10]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[11]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[12]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[13]  Mengdi Wang,et al.  An online primal-dual method for discounted Markov decision processes , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[14]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Rob Fergus,et al.  Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks , 2015, NIPS.

[19]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[20]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[21]  John E. Hopcroft,et al.  Stacked Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Mingqiang Zhu,et al.  An Efficient Primal-Dual Hybrid Gradient Algorithm For Total Variation Image Restoration , 2008 .

[23]  Lior Wolf,et al.  Unsupervised Cross-Domain Image Generation , 2016, ICLR.

[24]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[25]  Abhinav Gupta,et al.  Generative Image Modeling Using Style and Structure Adversarial Networks , 2016, ECCV.

[26]  Dimitris N. Metaxas,et al.  StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Yuchen Zhang,et al.  Stochastic Primal-Dual Coordinate Method for Regularized Empirical Risk Minimization , 2014, ICML.

[28]  Yunmei Chen,et al.  Optimal Primal-Dual Methods for a Class of Saddle Point Problems , 2013, SIAM J. Optim..

[29]  Wei Liu,et al.  On Stochastic Primal-Dual Hybrid Gradient Approach for Compositely Regularized Minimization , 2016, ECAI.

[30]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[31]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Ichiro Takeuchi,et al.  Stochastic Primal Dual Coordinate Method with Non-Uniform Sampling Based on Optimality Violations , 2017, ArXiv.

[34]  Martín Abadi,et al.  Learning to Protect Communications with Adversarial Neural Cryptography , 2016, ArXiv.

[35]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[36]  Zhanxing Zhu,et al.  Stochastic Parallel Block Coordinate Descent for Large-Scale Saddle Point Problems , 2016, AAAI.

[37]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[38]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[39]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[40]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.

[41]  Zhanxing Zhu,et al.  Adaptive Stochastic Primal-Dual Coordinate Descent for Separable Saddle Point Problems , 2015, ECML/PKDD.

[42]  Lihong Li,et al.  Stochastic Variance Reduction Methods for Policy Evaluation , 2017, ICML.

[43]  Andrew Brock,et al.  Neural Photo Editing with Introspective Adversarial Networks , 2016, ICLR.

[44]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[45]  Lin Xiao,et al.  Exploiting Strong Convexity from Data with Primal-Dual First-Order Algorithms , 2017, ICML.

[46]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[47]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[48]  Min Li,et al.  Adaptive Primal-Dual Splitting Methods for Statistical Learning and Image Processing , 2015, NIPS.

[49]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Adams Wei Yu,et al.  Doubly Stochastic Primal-Dual Coordinate Method for Empirical Risk Minimization and Bilinear Saddle-Point Problem , 2015 .