MMD GAN: Towards Deeper Understanding of Moment Matching Network

Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD GAN. The new distance measure in MMD GAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR- 10, CelebA and LSUN, the performance of MMD-GAN significantly outperforms GMMN, and is competitive with other representative GAN works.

[1]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[2]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[3]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[4]  Larry Wasserman,et al.  All of Statistics: A Concise Course in Statistical Inference , 2004 .

[5]  Bernhard Schölkopf,et al.  Kernel Mean Embedding of Distributions: A Review and Beyonds , 2016, Found. Trends Mach. Learn..

[6]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[7]  Andrea Vedaldi,et al.  Adversarial Generator-Encoder Networks , 2017, ArXiv.

[8]  Sivaraman Balakrishnan,et al.  Optimal kernel choice for large-scale two-sample tests , 2012, NIPS.

[9]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[11]  Edwin Lughofer,et al.  Central Moment Discrepancy (CMD) for Domain-Invariant Representation Learning , 2017, ICLR.

[12]  Klaus Obermayer,et al.  Support vector learning for ordinal regression , 1999 .

[13]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[14]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[15]  Bernhard Schölkopf,et al.  Hilbert Space Embeddings and Metrics on Probability Measures , 2009, J. Mach. Learn. Res..

[16]  David Berthelot,et al.  BEGAN: Boundary Equilibrium Generative Adversarial Networks , 2017, ArXiv.

[17]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[18]  Yoshua Bengio,et al.  Improving Generative Adversarial Networks with Denoising Feature Matching , 2016, ICLR.

[19]  Yu Cheng,et al.  Generative Adversarial Networks as Variational Training of Energy Based Models , 2016, ArXiv.

[20]  Vaibhava Goel,et al.  McGan: Mean and Covariance Feature Matching GAN , 2017, ICML.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Alexander J. Smola,et al.  Generative Models and Model Criticism via Optimized Maximum Mean Discrepancy , 2016, ICLR.

[23]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[24]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[25]  Richard S. Zemel,et al.  Generative Moment Matching Networks , 2015, ICML.

[26]  Zoubin Ghahramani,et al.  Training generative neural networks via Maximum Mean Discrepancy optimization , 2015, UAI.

[27]  Aaron C. Courville,et al.  Adversarially Learned Inference , 2016, ICLR.

[28]  Marc G. Bellemare,et al.  The Cramer Distance as a Solution to Biased Wasserstein Gradients , 2017, ArXiv.

[29]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[30]  Andrea Vedaldi,et al.  It Takes (Only) Two: Adversarial Generator-Encoder Networks , 2017, AAAI.

[31]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[32]  Yingyu Liang,et al.  Generalization and Equilibrium in Generative Adversarial Nets (GANs) , 2017, ICML.

[33]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[34]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.