Exponential Family Estimation via Adversarial Dynamics Embedding

We present an efficient algorithm for maximum likelihood estimation (MLE) of exponential family models, with a general parametrization of the energy function that includes neural networks. We exploit the primal-dual view of the MLE with a kinetics augmented model to obtain an estimate associated with an adversarial dual sampler. To represent this sampler, we introduce a novel neural architecture, dynamics embedding, that generalizes Hamiltonian Monte-Carlo (HMC). The proposed approach inherits the flexibility of HMC while enabling tractable entropy estimation for the augmented model. By learning both a dual sampler and the primal model simultaneously, and sharing parameters between them, we obviate the requirement to design a separate sampling procedure once the model has been trained, leading to more effective learning. We show that many existing estimators, such as contrastive divergence, pseudo/composite-likelihood, score matching, minimum Stein discrepancy estimator, non-local contrastive objectives, noise-contrastive estimation, and minimum probability flow, are special cases of the proposed approach, each expressed by a different (fixed) dual sampler. An empirical investigation shows that adapting the sampler during MLE can significantly improve on state-of-the-art estimators.

[1]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[2]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[3]  Stefano Ermon,et al.  A-NICE-MC: Adversarial Training for MCMC , 2017, NIPS.

[4]  Yoshua Bengio,et al.  Deep Directed Generative Models with Energy-Based Probability Estimation , 2016, ArXiv.

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  José Miguel Hernández-Lobato,et al.  Meta-Learning for Stochastic Gradient MCMC , 2018, ICLR.

[7]  Dilin Wang,et al.  Learning to Draw Samples with Amortized Stein Variational Gradient Descent , 2017, UAI.

[8]  Arthur Gretton,et al.  Efficient and principled score estimation with Nyström kernel exponential families , 2017, AISTATS.

[9]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[10]  Le Song,et al.  Coupled Variational Bayes via Optimization Embedding , 2018, NeurIPS.

[11]  Michael U. Gutmann,et al.  Conditional Noise-Contrastive Estimation of Unnormalised Models , 2018, ICML.

[12]  Aapo Hyvärinen,et al.  Some extensions of score matching , 2007, Comput. Stat. Data Anal..

[13]  Song-Chun Zhu,et al.  GRADE: Gibbs reaction and diffusion equations , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[14]  Geoffrey E. Hinton,et al.  Using fast weights to improve persistent contrastive divergence , 2009, ICML '09.

[15]  Philip Bachman,et al.  Calibrating Energy-based Generative Adversarial Networks , 2017, ICLR.

[16]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[17]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[18]  Dilin Wang,et al.  Learning Deep Energy Models: Contrastive Divergence vs. Amortized MLE , 2017, ArXiv.

[19]  Alessandro Barp,et al.  Minimum Stein Discrepancy Estimators , 2019, NeurIPS.

[20]  Daphne Koller,et al.  Non-Local Contrastive Objectives , 2010, ICML.

[21]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[22]  Yann LeCun,et al.  Regularized estimation of image statistics by Score Matching , 2010, NIPS.

[23]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[24]  Jinwoo Shin,et al.  Large-scale log-determinant computation through stochastic Chebyshev expansions , 2015, ICML.

[25]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[26]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[27]  Igor Mordatch,et al.  Implicit Generation and Generalization with Energy Based Models , 2018 .

[28]  Yang Lu,et al.  Sparse and deep generalizations of the FRAME model , 2018 .

[29]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[30]  Harry Joe,et al.  Composite Likelihood Methods , 2012 .

[31]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[32]  Aapo Hyv Estimation of Non-Normalized Statistical Models by Score Matching , 2005 .

[33]  Fu Jie Huang,et al.  A Tutorial on Energy-Based Learning , 2006 .

[34]  Lawrence D. Brown Fundamentals of Statistical Exponential Families , 1987 .

[35]  Aapo Hyvärinen,et al.  Connections Between Score Matching, Contrastive Divergence, and Pseudolikelihood for Continuous-Valued Variables , 2007, IEEE Transactions on Neural Networks.

[36]  Jiquan Ngiam,et al.  Learning Deep Energy Models , 2011, ICML.

[37]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[38]  Jascha Sohl-Dickstein,et al.  Minimum Probability Flow Learning , 2009, ICML.

[39]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[40]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[41]  Arthur Gretton,et al.  Learning deep kernels for exponential family densities , 2018, ICML.

[42]  David Duvenaud,et al.  FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models , 2018, ICLR.

[43]  Bernhard Schölkopf,et al.  A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..

[44]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[45]  E Weinan,et al.  Monge-Ampère Flow for Generative Modeling , 2018, ArXiv.

[46]  Le Song,et al.  Kernel Exponential Family Estimation via Doubly Dual Embedding , 2018, AISTATS.

[47]  Tijmen Tieleman,et al.  Training restricted Boltzmann machines using approximations to the likelihood gradient , 2008, ICML '08.

[48]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[49]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[50]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[51]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[52]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[53]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[54]  Yuichi Yoshida,et al.  Spectral Normalization for Generative Adversarial Networks , 2018, ICLR.

[55]  Christos Boutsidis,et al.  A Randomized Algorithm for Approximating the Log Determinant of a Symmetric Positive Definite Matrix , 2015, ArXiv.

[56]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[57]  Aapo Hyvärinen,et al.  Density Estimation in Infinite Dimensional Exponential Families , 2013, J. Mach. Learn. Res..