SUMO: Unbiased Estimation of Log Marginal Probability for Latent Variable Models

The standard variational lower bounds used to train latent variable models produce biased estimates of most quantities of interest. We introduce an unbiased estimator of the log marginal likelihood and its gradients for latent variable models based on randomized truncation of infinite series. If parameterized by an encoder-decoder architecture, the parameters of the encoder can be optimized to minimize its variance of this estimator. We show that models trained using our estimator give better test-set likelihoods than a standard importance-sampling based approach for the same average computational cost. This estimator also allows use of latent variable models for tasks where unbiased estimators, rather than marginal likelihood lower bounds, are preferred, such as minimizing reverse KL divergences and estimating score functions.

[1]  R. A. Leibler,et al.  Matrix inversion by a Monte Carlo method , 1950 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  J. Kuti Stochastic Method for the Numerical Study of Lattice Fermions , 1982 .

[4]  W. Wagner Unbiased Monte Carlo evaluation of certain functional integrals , 1987 .

[5]  James Arvo,et al.  Particle transport and image synthesis , 1990, SIGGRAPH.

[6]  T. Rychlik Unbiased nonparametric estimation of the derivative of the mean , 1990 .

[7]  Jing Peng,et al.  Function Optimization using Connectionist Reinforcement Learning Algorithms , 1991 .

[8]  M. Newton Approximate Bayesian-inference With the Weighted Likelihood Bootstrap , 1994 .

[9]  T. Rychlik A class of unbiased kernel estimates of a probability density function , 1995 .

[10]  S. Chib Marginal Likelihood from the Gibbs Output , 1995 .

[11]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[12]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[13]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[14]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[15]  Endre Boros,et al.  Pseudo-Boolean optimization , 2002, Discret. Appl. Math..

[16]  Radford M. Neal Slice Sampling , 2003, The Annals of Statistics.

[17]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[18]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[19]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[20]  P. Fearnhead,et al.  Particle filters for partially observed diffusions , 2007, 0710.4245.

[21]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[22]  E. Gelbard,et al.  Monte Carlo Principles and Neutron Transport Problems , 2008 .

[23]  Peter J. Lenk,et al.  Simulation Pseudo-Bias Correction to the Harmonic Mean Estimator of Integrated Likelihoods , 2009 .

[24]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[25]  Yee Whye Teh,et al.  Bayesian Learning via Stochastic Gradient Langevin Dynamics , 2011, ICML.

[26]  Don McLeish,et al.  A general method for debiasing a Monte Carlo estimator , 2010, Monte Carlo Methods Appl..

[27]  Peter W. Glynn,et al.  A new approach to unbiased estimation for SDE's , 2012, Proceedings Title: Proceedings of the 2012 Winter Simulation Conference (WSC).

[28]  David Barber,et al.  Variational Optimization , 2012, ArXiv.

[29]  Yves F. Atchad'e,et al.  On Russian Roulette Estimates for Bayesian Inference with Doubly-Intractable Likelihoods , 2013, 1306.4032.

[30]  Playing Russian Roulette with Intractable Likelihoods , 2013 .

[31]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[32]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[33]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[34]  Peter W. Glynn,et al.  Unbiased Estimation with Square Root Convergence for SDE Models , 2015, Oper. Res..

[35]  P. Jacob,et al.  On nonnegative unbiased estimators , 2013, 1309.6473.

[36]  Dale Schuurmans,et al.  Reward Augmented Maximum Likelihood for Neural Structured Prediction , 2016, NIPS.

[37]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[38]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[39]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[40]  David M. Blei,et al.  Overdispersed Black-Box Variational Inference , 2016, UAI.

[41]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[42]  Manfred Opper,et al.  Perturbative Black Box Variational Inference , 2017, NIPS.

[43]  Colin Wei,et al.  Markov Chain Truncation for Doubly-Intractable Inference , 2016, AISTATS.

[44]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[45]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[46]  Yann Ollivier,et al.  Unbiasing Truncated Backpropagation Through Time , 2017, ArXiv.

[47]  John O'Leary,et al.  Unbiased Markov chain Monte Carlo with couplings , 2017, 1708.03625.

[48]  Sashank J. Reddi,et al.  On the Convergence of Adam and Beyond , 2018, ICLR.

[49]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[50]  Justin Domke,et al.  Importance Weighting and Variational Inference , 2018, NeurIPS.

[51]  Bo Zhang,et al.  Semi-crowdsourced Clustering with Deep Generative Models , 2018, NeurIPS.

[52]  Sebastian Nowozin,et al.  Debiasing Evidence Approximations: On Importance-weighted Autoencoders and Jackknife Variational Inference , 2018, ICLR.

[53]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[54]  Arthur Gretton,et al.  Demystifying MMD GANs , 2018, ICLR.

[55]  Sergey Levine,et al.  Latent Space Policies for Hierarchical Reinforcement Learning , 2018, ICML.

[56]  Jinwoo Shin,et al.  Stochastic Chebyshev Gradient Descent for Spectral Optimization , 2018, NeurIPS.

[57]  Hongseok Yang,et al.  On Nesting Monte Carlo Estimators , 2017, ICML.

[58]  Alexandre Lacoste,et al.  Neural Autoregressive Flows , 2018, ICML.

[59]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[60]  Alexandre Lacoste,et al.  Hierarchical Importance Weighted Autoencoders , 2019, ICML.

[61]  Emiel Hoogeboom,et al.  Integer Discrete Flows and Lossless Compression , 2019, NeurIPS.

[62]  Ryan P. Adams,et al.  Efficient Optimization of Loops and Limits with Randomized Telescoping Sums , 2019, ICML.

[63]  George Tucker,et al.  Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives , 2019, ICLR.

[64]  Kumar Krishna Agrawal,et al.  Discrete Flows: Invertible Generative Models of Discrete Data , 2019, DGS@ICLR.

[65]  David Duvenaud,et al.  Residual Flows for Invertible Generative Modeling , 2019, NeurIPS.

[66]  Kai Xu,et al.  Variational Russian Roulette for Deep Bayesian Nonparametrics , 2019, ICML.