Coupled Variational Bayes via Optimization Embedding

Variational inference plays a vital role in learning graphical models, especially on large-scale datasets. Much of its success depends on a proper choice of auxiliary distribution class for posterior approximation. However, how to pursue an auxiliary distribution class that achieves both good approximation ability and computation efficiency remains a core challenge. In this paper, we proposed coupled variational Bayes which exploits the primal-dual view of the ELBO with the variational distribution class generated by an optimization procedure, which is termed optimization embedding. This flexible function class couples the variational distribution with the original parameters in the graphical models, allowing end-to-end learning of the graphical models by back-propagation through the variational distribution. Theoretically, we establish an interesting connection to gradient flow and demonstrate the extreme flexibility of this implicit distribution family in the limit sense. Empirically, we demonstrate the effectiveness of the proposed method on multiple graphical models with either continuous or discrete latent variables comparing to state-of-the-art methods.

[1]  Veselin Stoyanov,et al.  Empirical Risk Minimization of Graphical Model Parameters Given Approximate Inference, Decoding, and Model Structure , 2011, AISTATS.

[2]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[3]  Sebastian Nowozin,et al.  Adversarial Variational Bayes: Unifying Variational Autoencoders and Generative Adversarial Networks , 2017, ICML.

[4]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[5]  Le Song,et al.  Provable Bayesian Inference via Particle Mirror Descent , 2015, AISTATS.

[6]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[7]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[8]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[9]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[10]  Radford M. Neal Probabilistic Inference Using Markov Chain Monte Carlo Methods , 2011 .

[11]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[12]  Miguel Lázaro-Gredilla,et al.  Doubly Stochastic Variational Bayes for non-Conjugate Inference , 2014, ICML.

[13]  Richard E. Turner,et al.  Two problems with variational expectation maximisation for time-series models , 2011 .

[14]  Jakub M. Tomczak,et al.  UvA-DARE ( Digital Academic Repository ) Improving Variational Auto-Encoders using Householder Flow , 2016 .

[15]  Michael I. Jordan Learning in Graphical Models , 1999, NATO ASI Series.

[16]  Alexander M. Rush,et al.  Semi-Amortized Variational Autoencoders , 2018, ICML.

[17]  Timothy J. Robinson,et al.  Sequential Monte Carlo Methods in Practice , 2003 .

[18]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[19]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[20]  Jürgen Schmidhuber,et al.  Neural Expectation Maximization , 2017, NIPS.

[21]  F. Otto THE GEOMETRY OF DISSIPATIVE EVOLUTION EQUATIONS: THE POROUS MEDIUM EQUATION , 2001 .

[22]  Jonathan Le Roux,et al.  Deep Unfolding: Model-Based Inspiration of Novel Deep Architectures , 2014, ArXiv.

[23]  Jen-Tzung Chien,et al.  Deep Unfolding for Topic Models , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[25]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[26]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[27]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[28]  Yelong Shen,et al.  End-to-end Learning of LDA by Mirror-Descent Back Propagation over a Deep Architecture , 2015, NIPS.

[29]  Le Song,et al.  Learning from Conditional Distributions via Dual Embeddings , 2016, AISTATS.

[30]  A. Zellner Optimal Information Processing and Bayes's Theorem , 1988 .

[31]  Yisong Yue,et al.  Iterative Amortized Inference , 2018, ICML.

[32]  Marc Teboulle,et al.  Mirror descent and nonlinear projected subgradient methods for convex optimization , 2003, Oper. Res. Lett..

[33]  Takafumi Kanamori,et al.  Density Ratio Estimation in Machine Learning , 2012 .

[34]  Qiang Liu,et al.  Stein Variational Gradient Descent as Gradient Flow , 2017, NIPS.

[35]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[36]  Alexander Shapiro,et al.  Lectures on Stochastic Programming: Modeling and Theory , 2009 .

[37]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[38]  David M. Blei,et al.  Nonparametric variational inference , 2012, ICML.

[39]  Andrew McCallum,et al.  End-to-End Learning for Structured Prediction Energy Networks , 2017, ICML.

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[42]  Justin Domke,et al.  Generic Methods for Optimization-Based Modeling , 2012, AISTATS.