Variational (Gradient) Estimate of the Score Function in Energy-based Latent Variable Models

The learning and evaluation of energy-based latent variable models (EBLVMs) without any structural assumptions are highly challenging, because the true posteriors and the partition functions in such models are generally intractable. This paper presents variational estimates of the score function and its gradient with respect to the model parameters in a general EBLVM, referred to as VaES and VaGES respectively. The variational posterior is trained to minimize a certain divergence to the true model posterior and the bias in both estimates can be bounded by the divergence theoretically. With a minimal model assumption, VaES and VaGES can be applied to the kernelized Stein discrepancy (KSD) and score matching (SM)-based methods to learn EBLVMs. Besides, VaES can also be used to estimate the exact Fisher divergence between the data and general EBLVMs.

[1]  Geoffrey E. Hinton Reducing the Dimensionality of Data with Neural , 2008 .

[2]  Stefano Ermon,et al.  Neural Variational Inference and Learning in Undirected Graphical Models , 2017, NIPS.

[3]  Guang Cheng,et al.  Stein Neural Sampler , 2018, ArXiv.

[4]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[5]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[6]  Ruslan Salakhutdinov,et al.  On the quantitative analysis of deep belief networks , 2008, ICML '08.

[7]  Andrew M. Dai,et al.  Flow Contrastive Estimation of Energy-Based Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[9]  Xiao Wang,et al.  Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models , 2020, ICLR.

[10]  Igor Mordatch,et al.  Implicit Generation and Modeling with Energy Based Models , 2019, NeurIPS.

[11]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[12]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[13]  Hang Su,et al.  Bi-level Score Matching for Learning Energy-based Latent Variable Models , 2020, NeurIPS.

[14]  Bernhard Schölkopf,et al.  Deep Energy Estimator Networks , 2018, ArXiv.

[15]  Bo Zhang,et al.  Learning Implicit Generative Models by Teaching Explicit Ones , 2018, ArXiv.

[16]  Zengyi Li,et al.  Learning Energy-Based Models in High-Dimensional Spaces with Multi-scale Denoising Score Matching , 2019, 1910.07762.

[17]  Yvik Swan,et al.  Stein’s density approach and information inequalities , 2012, 1210.3921.

[18]  Bo Zhang,et al.  Adversarial Variational Inference and Learning in Markov Random Fields , 2019, ArXiv.

[19]  Erik Nijkamp,et al.  On Learning Non-Convergent Short-Run MCMC Toward Energy-Based Model , 2019, ArXiv.

[20]  Yiming Yang,et al.  MMD GAN: Towards Deeper Understanding of Moment Matching Network , 2017, NIPS.

[21]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[22]  Jianfeng Feng,et al.  On Fenchel Mini-Max Learning , 2019, NeurIPS.

[23]  Dilin Wang,et al.  Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm , 2016, NIPS.

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  Richard Zemel,et al.  Learning the Stein Discrepancy for Training and Evaluating Energy-Based Models without Sampling , 2020, ICML.

[26]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[27]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[28]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[29]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[31]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[32]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[33]  Le Song,et al.  Exponential Family Estimation via Adversarial Dynamics Embedding , 2019, NeurIPS.

[34]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[35]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[36]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[37]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[38]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[39]  R. Salakhutdinov Learning and Evaluating Boltzmann Machines , 2008 .

[40]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[41]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[42]  Stefano Ermon,et al.  Improved Techniques for Training Score-Based Generative Models , 2020, NeurIPS.

[43]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[44]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[45]  Le Song,et al.  Kernel Exponential Family Estimation via Doubly Dual Embedding , 2018, AISTATS.

[46]  Max Welling,et al.  To Relieve Your Headache of Training an MRF, Take AdVIL , 2020, ICLR.

[47]  Weitang Liu,et al.  Energy-based Out-of-distribution Detection , 2020, NeurIPS.

[48]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[49]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[50]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Abhishek Kumar,et al.  Score-Based Generative Modeling through Stochastic Differential Equations , 2020, ICLR.

[52]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[53]  Stefano Ermon,et al.  Efficient Learning of Generative Models via Finite-Difference Score Matching , 2020, NeurIPS.

[54]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[55]  Jianqing Fan,et al.  An Overview of the Estimation of Large Covariance and Precision Matrices , 2015, The Econometrics Journal.

[56]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[57]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[58]  Jos'e Miguel Hern'andez-Lobato,et al.  Sliced Kernelized Stein Discrepancy , 2020, ICLR.

[59]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[60]  Erik Nijkamp,et al.  Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model , 2019, NeurIPS.

[61]  Yang Lu,et al.  Cooperative Training of Descriptor and Generator Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Lester W. Mackey,et al.  Measuring Sample Quality with Kernels , 2017, ICML.

[63]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[64]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[65]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[66]  Eszter Vértes Learning Doubly Intractable Latent Variable Models via Score Matching , 2016 .

[67]  O. Johnson Information Theory And The Central Limit Theorem , 2004 .

[68]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[69]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[70]  Michael U. Gutmann,et al.  Variational Noise-Contrastive Estimation , 2018, AISTATS.

[71]  Tian Han,et al.  Joint Training of Variational Auto-Encoder and Latent Energy-Based Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).