Bi-level Score Matching for Learning Energy-based Latent Variable Models

Score matching (SM) provides a compelling approach to learn energy-based models (EBMs) by avoiding the calculation of partition function. However, it remains largely open to learn energy-based latent variable models (EBLVMs), except some special cases. This paper presents a bi-level score matching (BiSM) method to learn EBLVMs with general structures by reformulating SM as a bi-level optimization problem. The higher level introduces a variational posterior of the latent variables and optimizes a modified SM objective, and the lower level optimizes the variational posterior to fit the true posterior. To solve BiSM efficiently, we develop a stochastic optimization algorithm with gradient unrolling. Theoretically, we analyze the consistency of BiSM and the convergence of the stochastic algorithm. Empirically, we show the promise of BiSM in Gaussian restricted Boltzmann machines and highly nonstructural EBLVMs parameterized by deep convolutional neural networks. BiSM is comparable to the widely adopted contrastive divergence and SM methods when they are applicable; and can learn complex EBLVMs with intractable posteriors to generate natural images.

[1]  Siwei Lyu,et al.  Interpretation and Generalization of Score Matching , 2009, UAI.

[2]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[3]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[4]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[5]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[6]  O. Johnson Information Theory And The Central Limit Theorem , 2004 .

[7]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[8]  Stefano Ermon,et al.  Neural Variational Inference and Learning in Undirected Graphical Models , 2017, NIPS.

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Tapani Raiko,et al.  Gaussian-Bernoulli deep Boltzmann machine , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  Max Welling,et al.  To Relieve Your Headache of Training an MRF, Take AdVIL , 2020, ICLR.

[13]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[14]  Mohammad Norouzi,et al.  Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One , 2019, ICLR.

[15]  Debora S. Marks,et al.  Variational Inference for Sparse and Undirected Models , 2016, ICML.

[16]  Yang Song,et al.  Sliced Score Matching: A Scalable Approach to Density and Score Estimation , 2019, UAI.

[17]  Bo Zhang,et al.  Graphical Generative Adversarial Networks , 2018, NeurIPS.

[18]  Honglak Lee,et al.  Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations , 2009, ICML '09.

[19]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[20]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[21]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[22]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[23]  Ryan P. Adams,et al.  Gradient-based Hyperparameter Optimization through Reversible Learning , 2015, ICML.

[24]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[25]  Tapani Raiko,et al.  Improved Learning of Gaussian-Bernoulli Restricted Boltzmann Machines , 2011, ICANN.

[26]  Bernhard Schölkopf,et al.  Deep Energy Estimator Networks , 2018, ArXiv.

[27]  Erik Nijkamp,et al.  On Learning Non-Convergent Short-Run MCMC Toward Energy-Based Model , 2019, ArXiv.

[28]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[29]  Nando de Freitas,et al.  On Autoencoders and Score Matching for Energy Based Models , 2011, ICML.

[30]  Yang Song,et al.  Generative Modeling by Estimating Gradients of the Data Distribution , 2019, NeurIPS.

[31]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[32]  Xiao Wang,et al.  Unbiased Contrastive Divergence Algorithm for Training Energy-Based Latent Variable Models , 2020, ICLR.

[33]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[34]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[35]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[37]  Paolo Frasconi,et al.  Bilevel Programming for Hyperparameter Optimization and Meta-Learning , 2018, ICML.

[38]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[39]  Bo Zhang,et al.  Learning Implicit Generative Models by Teaching Explicit Ones , 2018, ArXiv.

[40]  Eszter Vértes Learning Doubly Intractable Latent Variable Models via Score Matching , 2016 .

[41]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[42]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[43]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.

[44]  Tian Han,et al.  Joint Training of Variational Auto-Encoder and Latent Energy-Based Model , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[46]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[47]  Aapo Hyvärinen,et al.  Estimation of Non-Normalized Statistical Models by Score Matching , 2005, J. Mach. Learn. Res..

[48]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[49]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[50]  Andrew M. Dai,et al.  Flow Contrastive Estimation of Energy-Based Models , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Yue Yang,et al.  Variational approximations using Fisher divergence , 2019, ArXiv.

[52]  Yang Lu,et al.  Cooperative Training of Descriptor and Generator Networks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Tian Han,et al.  On the Anatomy of MCMC-based Maximum Likelihood Learning of Energy-Based Models , 2019, AAAI.

[54]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[55]  Vladlen Koltun,et al.  Deep Equilibrium Models , 2019, NeurIPS.

[56]  Jun Zhu,et al.  Triple Generative Adversarial Nets , 2017, NIPS.

[57]  Michael U. Gutmann,et al.  Variational Noise-Contrastive Estimation , 2018, AISTATS.

[58]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[59]  J. Zico Kolter,et al.  OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[60]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[61]  Sergey Levine,et al.  A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models , 2016, ArXiv.

[62]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[63]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[64]  Sergey Levine,et al.  Meta-Learning with Implicit Gradients , 2019, NeurIPS.

[65]  Stefano Ermon,et al.  Efficient Learning of Generative Models via Finite-Difference Score Matching , 2020, NeurIPS.

[66]  Bo Zhang,et al.  Adversarial Variational Inference and Learning in Markov Random Fields , 2019, ArXiv.