Adversarial Fisher Vectors for Unsupervised Representation Learning

We examine Generative Adversarial Networks (GANs) through the lens of deep Energy Based Models (EBMs), with the goal of exploiting the density model that follows from this formulation. In contrast to a traditional view where the discriminator learns a constant function when reaching convergence, here we show that it can provide useful information for downstream tasks, e.g., feature extraction for classification. To be concrete, in the EBM formulation, the discriminator learns an unnormalized density function (i.e., the negative energy term) that characterizes the data manifold. We propose to evaluate both the generator and the discriminator by deriving corresponding Fisher Score and Fisher Information from the EBM. We show that by assuming that the generated examples form an estimate of the learned density, both the Fisher Information and the normalized Fisher Vectors are easy to compute. We also show that we are able to derive a distance metric between examples and between sets of examples. We conduct experiments showing that the GAN-induced Fisher Vectors demonstrate competitive performance as unsupervised feature extractors for classification and perceptual similarity tasks. Code is available at \url{https://github.com/apple/ml-afv}.

[1]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[2]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[3]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[4]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[5]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[6]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[7]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[8]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[9]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[10]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[11]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[12]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[13]  Yu Cheng,et al.  Generative Adversarial Networks as Variational Training of Energy Based Models , 2016, ArXiv.

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[18]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[19]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[20]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[21]  Sebastian Nowozin,et al.  f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization , 2016, NIPS.

[22]  Dilin Wang,et al.  Learning to Draw Samples: With Application to Amortized MLE for Generative Adversarial Learning , 2016, ArXiv.

[23]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[24]  Sebastian Nowozin,et al.  Which Training Methods for GANs do actually Converge? , 2018, ICML.

[25]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[26]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[27]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[28]  Yann LeCun,et al.  Energy-based Generative Adversarial Network , 2016, ICLR.

[29]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[30]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[31]  Alexei A. Efros,et al.  The Unreasonable Effectiveness of Deep Features as a Perceptual Metric , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[32]  Jiebo Luo,et al.  AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transformations Rather Than Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Trevor Darrell,et al.  Data-dependent Initializations of Convolutional Neural Networks , 2015, ICLR.

[34]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[35]  Sebastian Nowozin,et al.  Stabilizing Training of Generative Adversarial Networks through Regularization , 2017, NIPS.

[36]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[37]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[38]  Raymond Y. K. Lau,et al.  Least Squares Generative Adversarial Networks , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Thomas Brox,et al.  Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Shun-ichi Amari,et al.  Methods of information geometry , 2000 .

[41]  Lars M. Mescheder,et al.  On the convergence properties of GAN training , 2018, ArXiv.

[42]  Subhransu Maji,et al.  Task2Vec: Task Embedding for Meta-Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[43]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[44]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.