NEO: Non Equilibrium Sampling on the Orbit of a Deterministic Transform

Sampling from a complex distribution π and approximating its intractable normalizing constant Z are challenging problems. In this paper, a novel family of importance samplers (IS) and Markov chain Monte Carlo (MCMC) samplers is derived. Given an invertible map T, these schemes combine (with weights) elements from the forward and backward Orbits through points sampled from a proposal distribution ρ. The map T does not leave the target π invariant, hence the name NEO, standing for Non-Equilibrium Orbits. NEO-IS provides unbiased estimators of the normalizing constant and self-normalized IS estimators of expectations under π while NEO-MCMC combines multiple NEO-IS estimates of the normalizing constant and an iterated sampling-importance resampling mechanism to sample from π. For T chosen as a discrete-time integrator of a conformal Hamiltonian system, NEO-IS achieves state-of-the art performance on difficult benchmarks and NEO-MCMC is able to explore highly multimodal targets. Additionally, we provide detailed theoretical results for both methods. In particular, we show that NEO-MCMC is uniformly geometrically ergodic and establish explicit mixing time estimates under mild conditions. 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia.

[1]  D. Vere-Jones Markov Chains , 1972, Nature.

[2]  George Tucker,et al.  Energy-Inspired Models: Learning with Sampler-Induced Distributions , 2019, NeurIPS.

[3]  Arnaud Doucet,et al.  Unbiased Gradient Estimation for Variational Auto-Encoders using Coupled Markov Chains , 2021, UAI.

[4]  J. Skilling Nested sampling for general Bayesian computation , 2006 .

[5]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[6]  P. Hall,et al.  Martingale Limit Theory and Its Application , 1980 .

[7]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[8]  Jason Yosinski,et al.  Metropolis-Hastings Generative Adversarial Networks , 2018, ICML.

[9]  Eric Vanden-Eijnden,et al.  Dynamical Computation of the Density of States and Bayes Factors Using Nonequilibrium Importance Sampling. , 2019, Physical review letters.

[10]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.

[11]  C. Robert,et al.  Properties of nested sampling , 2008, 0801.3887.

[12]  He Jia,et al.  Normalizing Constant Estimation with Gaussianized Bridge Sampling , 2019, AABI.

[13]  Alan E. Gelfand,et al.  Bayesian statistics without tears: A sampling-resampling perspective , 1992 .

[14]  A. Owen,et al.  Safe and Effective Importance Sampling , 2000 .

[15]  David Barber,et al.  An Auxiliary Variational Method , 2004, ICONIP.

[16]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[17]  David Duvenaud,et al.  Reinterpreting Importance-Weighted Autoencoders , 2017, ICLR.

[18]  Christiane Lemieux,et al.  Acceleration of the Multiple-Try Metropolis algorithm using antithetic and stratified sampling , 2007, Stat. Comput..

[19]  Ömer Deniz Akyildiz,et al.  Convergence rates for optimised adaptive importance samplers , 2019, Stat. Comput..

[20]  Eric Nalisnick,et al.  Normalizing Flows for Probabilistic Modeling and Inference , 2019, J. Mach. Learn. Res..

[21]  Noah D. Goodman,et al.  Pyro: Deep Universal Probabilistic Programming , 2018, J. Mach. Learn. Res..

[22]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[23]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[24]  Robert E. Kass,et al.  Importance sampling: a review , 2010 .

[25]  Donald B. Rubin,et al.  Comment : A noniterative sampling/importance resampling alternative to the data augmentation algorithm for creating a few imputations when fractions of missing information are modest : The SIR Algorithm , 1987 .

[26]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[27]  Charles Blundell,et al.  Targeted free energy estimation via learned mappings. , 2020, The Journal of chemical physics.

[28]  Philip Hartman,et al.  Ordinary differential equations, Second Edition , 2002, Classics in applied mathematics.

[29]  Youssef M. Marzouk,et al.  Bayesian inference with optimal maps , 2011, J. Comput. Phys..

[30]  O. Papaspiliopoulos,et al.  Importance Sampling: Intrinsic Dimension and Computational Cost , 2015, 1511.06196.

[31]  Thomas Müller,et al.  Neural Importance Sampling , 2018, ACM Trans. Graph..

[32]  Radford M. Neal,et al.  Sampling Latent States for High-Dimensional Non-Linear State Space Models with the Embedded HMM Method , 2016, Bayesian Analysis.

[33]  A. Doucet,et al.  Particle Markov chain Monte Carlo methods , 2010 .

[34]  David J. Freedman,et al.  Learning Deep Generative Models with Annealed Importance Sampling , 2019 .

[35]  Christophe Andrieu,et al.  Uniform ergodicity of the iterated conditional SMC and geometric ergodicity of particle Gibbs samplers , 2013, 1312.6432.

[36]  Aurélien Garivier,et al.  Sequential Monte Carlo smoothing for general state space hidden Markov models , 2011, 1202.2945.

[37]  Diederik P. Kingma,et al.  An Introduction to Variational Autoencoders , 2019, Found. Trends Mach. Learn..

[38]  Hao Wu,et al.  Stochastic Normalizing Flows , 2020, NeurIPS.

[39]  L. Tierney Markov Chains for Exploring Posterior Distributions , 1994 .

[40]  Ø. Skare,et al.  Improved Sampling‐Importance Resampling and Reduced Bias Importance Sampling , 2003 .

[41]  Yoshua Bengio,et al.  Your GAN is Secretly an Energy-based Model and You Should use Discriminator Driven Latent Sampling , 2020, NeurIPS.

[42]  R. Douc,et al.  Uniform Ergodicity of the Particle Gibbs Sampler , 2014, 1401.0683.

[43]  Mike K. P. So Bayesian analysis of nonlinear and non-Gaussian state space models via multiple-try sampling methods , 2006, Stat. Comput..

[44]  Dennis Prangle Distilling importance sampling , 2019 .

[45]  Yee Whye Teh,et al.  Hamiltonian Descent Methods , 2018, ArXiv.