Meta-Learning MCMC Proposals

Effective implementations of sampling-based probabilistic inference often require manually constructed, model-specific proposals. Inspired by recent progresses in meta-learning for training learning agents that can generalize to unseen environments, we propose a meta-learning approach to building effective and generalizable MCMC proposals. We parametrize the proposal as a neural network to provide fast approximations to block Gibbs conditionals. The learned neural proposals generalize to occurrences of common structural motifs across different models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler yields higher final F1 scores than classical single-site Gibbs sampling.

[1]  S. Srihari Mixture Density Networks , 1994 .

[2]  Walter R. Gilks,et al.  BUGS - Bayesian inference Using Gibbs Sampling Version 0.50 , 1995 .

[3]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[4]  Radford M. Neal,et al.  A Split-Merge Markov chain Monte Carlo Procedure for the Dirichlet Process Mixture Model , 2004 .

[5]  Dan Klein,et al.  Structure compilation: trading structure for features , 2008, ICML '08.

[6]  Charles Kemp,et al.  The discovery of structural form , 2008, Proceedings of the National Academy of Sciences.

[7]  Thore Graepel,et al.  Matchbox: large scale online bayesian recommendations , 2009, WWW '09.

[8]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[9]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[10]  Vibhav Gogate,et al.  Join-Graph Propagation Algorithms , 2010, J. Artif. Intell. Res..

[11]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[12]  Martial Hebert,et al.  Learning message-passing inference machines for structured prediction , 2011, CVPR 2011.

[13]  Joshua B. Tenenbaum,et al.  Exploiting compositionality to explore a large space of model structures , 2012, UAI.

[14]  T. Minka,et al.  Detecting Parameter Symmetries in Probabilistic Models , 2013, 1312.5386.

[15]  Vibhav Gogate,et al.  Dynamic Blocking and Collapsing for Gibbs Sampling , 2013, UAI.

[16]  Daniel Tarlow,et al.  Learning to Pass Expectation Propagation Messages , 2013, NIPS.

[17]  Noah D. Goodman,et al.  Learning Stochastic Inverses , 2013, NIPS.

[18]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[19]  Dan Klein,et al.  Unsupervised Transcription of Piano Music , 2014, NIPS.

[20]  Richard E. Turner,et al.  Neural Adaptive Sequential Monte Carlo , 2015, NIPS.

[21]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[22]  Wei Wang,et al.  A Smart-Dumb/Dumb-Smart Algorithm for Efficient Split-Merge MCMC , 2015, UAI.

[23]  Daniel Turek,et al.  Automated Parameter Blocking for Efficient Markov-Chain Monte Carlo Sampling , 2015, 1503.05621.

[24]  Joshua B. Tenenbaum,et al.  Picture: A probabilistic programming language for scene perception , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Pat Hanrahan,et al.  Generating Design Suggestions under Tight Constraints with Gradient‐based Probabilistic Programming , 2015, Comput. Graph. Forum.

[26]  Pat Hanrahan,et al.  Neurally-Guided Procedural Models: Amortized Inference for Procedural Graphics Programs using Neural Networks , 2016, NIPS.

[27]  Marcin Andrychowicz,et al.  Learning to learn by gradient descent by gradient descent , 2016, NIPS.

[28]  Ruslan Salakhutdinov,et al.  Importance Weighted Autoencoders , 2015, ICLR.

[29]  Peter L. Bartlett,et al.  RL$^2$: Fast Reinforcement Learning via Slow Reinforcement Learning , 2016, ArXiv.

[30]  Frank D. Wood,et al.  Inference Networks for Sequential Monte Carlo in Graphical Models , 2016, ICML.

[31]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[32]  Jitendra Malik,et al.  Learning to Optimize Neural Nets , 2017, ArXiv.

[33]  Wojciech Zaremba,et al.  Domain randomization for transferring deep neural networks from simulation to the real world , 2017, 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[34]  Stuart J. Russell,et al.  Signal-based Bayesian Seismic Monitoring , 2017, AISTATS.

[35]  Sergey Levine,et al.  Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks , 2017, ICML.

[36]  Frank D. Wood,et al.  Inference Compilation and Universal Probabilistic Programming , 2016, AISTATS.

[37]  Stefano Ermon,et al.  A-NICE-MC: Adversarial Training for MCMC , 2017, NIPS.

[38]  Yuandong Tian,et al.  Building Generalizable Agents with a Realistic and Rich 3D Environment , 2018, ICLR.