Ergodic Measure Preserving Flows

Probabilistic modelling is a general and elegant framework to capture the uncertainty, ambiguity and diversity of data. Probabilistic inference is the core technique for developing training and simulation algorithms on probabilistic models. However, the classic inference methods, like Markov chain Monte Carlo (MCMC) methods and mean-field variational inference (VI), are not computationally scalable for the recent developed probabilistic models with neural networks (NNs). This motivates many recent works on improving classic inference methods using NNs, especially, NN empowered VI. However, even with powerful NNs, VI still suffers its fundamental limitations. In this work, we propose a novel computational scalable general inference framework. With the theoretical foundation in ergodic theory, the proposed methods are not only computationally scalable like NN-based VI methods but also asymptotically accurate like MCMC. We test our method on popular benchmark problems and the results suggest that our methods can outperform NN-based VI and MCMC on deep generative models and Bayesian neural networks.

[1]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[2]  A. V. D. Vaart,et al.  Asymptotic Statistics: U -Statistics , 1998 .

[3]  S. M. Ulam,et al.  Measure-Preserving Homeomorphisms and Metrical Transitivity , 1941 .

[4]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[5]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[6]  Matthew D. Hoffman,et al.  Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo , 2017, ICML.

[7]  Aaron Klein,et al.  Bayesian Optimization with Robust Bayesian Neural Networks , 2016, NIPS.

[8]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[9]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[12]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[13]  Tian Han,et al.  Alternating Back-Propagation for Generator Network , 2016, AAAI.

[14]  Arnaud Doucet,et al.  Hamiltonian Variational Auto-Encoder , 2018, NeurIPS.

[15]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[16]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[17]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[18]  Yee Whye Teh,et al.  Collapsed Variational Dirichlet Process Mixture Models , 2007, IJCAI.

[19]  Christian P. Robert,et al.  Monte Carlo Statistical Methods (Springer Texts in Statistics) , 2005 .

[20]  P. Spreij Probability and Measure , 1996 .

[21]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[22]  M. Escobar,et al.  Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[23]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[24]  Ole Winther,et al.  Ladder Variational Autoencoders , 2016, NIPS.

[25]  Dustin Tran,et al.  Variational Gaussian Process , 2015, ICLR.

[26]  Max Welling,et al.  Improved Variational Inference with Inverse Autoregressive Flow , 2016, NIPS 2016.

[27]  M. Girolami,et al.  Riemann manifold Langevin and Hamiltonian Monte Carlo methods , 2011, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[28]  Ruslan Salakhutdinov,et al.  On the Quantitative Analysis of Decoder-Based Generative Models , 2016, ICLR.

[29]  Yichuan Zhang,et al.  Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models , 2014, NIPS.

[30]  Yee Whye Teh,et al.  Tighter Variational Bounds are Not Necessarily Better , 2018, ICML.

[31]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[32]  Hugo Larochelle,et al.  Efficient Learning of Deep Boltzmann Machines , 2010, AISTATS.

[33]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .