Approximate Inference in Discrete Distributions with Monte Carlo Tree Search and Value Functions

A plethora of problems in AI, engineering and the sciences are naturally formalized as inference in discrete probabilistic models. Exact inference is often prohibitively expensive, as it may require evaluating the (unnormalized) target density on its entire domain. Here we consider the setting where only a limited budget of calls to the unnormalized density oracle is available, raising the challenge of where in the domain to allocate these function calls in order to construct a good approximate solution. We formulate this problem as an instance of sequential decision-making under uncertainty and leverage methods from reinforcement learning for probabilistic inference with budget constraints. In particular, we propose the TreeSample algorithm, an adaptation of Monte Carlo Tree Search to approximate inference. This algorithm caches all previous queries to the density oracle in an explicit search tree, and dynamically allocates new queries based on a "best-first" heuristic for exploration, using existing upper confidence bound methods. Our non-parametric inference method can be effectively combined with neural networks that compile approximate conditionals of the target, which are then used to guide the inference search and enable generalization across multiple target distributions. We show empirically that TreeSample outperforms standard approximate inference methods on synthetic factor graphs.

[1]  Donald W. Loveland,et al.  A machine program for theorem-proving , 2011, CACM.

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Jonas Mockus,et al.  On Bayesian Methods for Seeking the Extremum , 1974, Optimization Techniques.

[4]  Larry J. Stockmeyer,et al.  On Approximation Algorithms for #P , 1985, SIAM J. Comput..

[5]  David J. Spiegelhalter,et al.  Local computations with probabilities on graphical structures and their application to expert systems , 1990 .

[6]  P. Diaconis Bayesian Numerical Analysis , 1988 .

[7]  R. T. Cox Probability, frequency and reasonable expectation , 1990 .

[8]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[9]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[10]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[11]  Jung-Fu Cheng,et al.  Turbo Decoding as an Instance of Pearl's "Belief Propagation" Algorithm , 1998, IEEE J. Sel. Areas Commun..

[12]  Rina Dechter,et al.  Bucket Elimination: A Unifying Framework for Reasoning , 1999, Artif. Intell..

[13]  G. Roberts,et al.  Langevin Diffusions and Metropolis-Hastings Algorithms , 2002 .

[14]  Adnan Darwiche,et al.  A Logical Approach to Factoring Belief Networks , 2002, KR.

[15]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[16]  Toniann Pitassi,et al.  Algorithms and complexity results for #SAT and Bayesian inference , 2003, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[17]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[18]  E. Jaynes Probability theory : the logic of science , 2003 .

[19]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[20]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[21]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[22]  P. Grassberger Sequential Monte Carlo Methods for Protein Folding , 2004, cond-mat/0408571.

[23]  Henry A. Kautz,et al.  Solving Bayesian Networks by Weighted Model Counting , 2005 .

[24]  Csaba Szepesvári,et al.  Bandit Based Monte-Carlo Planning , 2006, ECML.

[25]  Matthew W. Hoffman,et al.  Trans-dimensional MCMC for Bayesian policy learning , 2007, NIPS 2007.

[26]  Adnan Darwiche,et al.  On probabilistic inference by weighted model counting , 2008, Artif. Intell..

[27]  Christophe Andrieu,et al.  A tutorial on adaptive MCMC , 2008, Stat. Comput..

[28]  A. Doucet,et al.  A Tutorial on Particle Filtering and Smoothing: Fifteen years later , 2008 .

[29]  Martin Klepal,et al.  A Backtracking Particle Filter for fusing building plans with PDR displacement estimates , 2008, 2008 5th Workshop on Positioning, Navigation and Communication.

[30]  Toniann Pitassi,et al.  Solving #SAT and Bayesian Inference with Backtracking Search , 2014, J. Artif. Intell. Res..

[31]  Joshua B. Tenenbaum,et al.  Exact and Approximate Sampling by Systematic Stochastic Search , 2009, AISTATS.

[32]  M. Mézard,et al.  Information, Physics, and Computation , 2009 .

[33]  Mark D. Robinson,et al.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data , 2009, Bioinform..

[34]  Radford M. Neal MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[35]  Simon M. Lucas,et al.  A Survey of Monte Carlo Tree Search Methods , 2012, IEEE Transactions on Computational Intelligence and AI in Games.

[36]  Sandy L. Zabell,et al.  Commentary on Alan M. Turing: The Applications of Probability to Cryptography , 2012, Cryptologia.

[37]  David Wingate,et al.  Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.

[38]  N. Roy,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .

[39]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[42]  Richard E. Turner,et al.  Neural Adaptive Sequential Monte Carlo , 2015, NIPS.

[43]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[44]  David Silver,et al.  Reinforced Variational Inference , 2015, NIPS 2015.

[45]  A. M. Johansen,et al.  The Iterated Auxiliary Particle Filter , 2015, 1511.06286.

[46]  Andriy Mnih,et al.  Variational Inference for Monte Carlo Objectives , 2016, ICML.

[47]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[48]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[49]  Carlo,et al.  Adversarial Sequential Monte Carlo , 2017 .

[50]  Pieter Abbeel,et al.  Equivalence Between Policy Gradients and Soft Q-Learning , 2017, ArXiv.

[51]  A. Doucet,et al.  Controlled sequential Monte Carlo , 2017, The Annals of Statistics.

[52]  Stefano Ermon,et al.  A-NICE-MC: Adversarial Training for MCMC , 2017, NIPS.

[53]  Chris J. Maddison,et al.  Twisted Variational Sequential Monte Carlo , 2018 .

[54]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[55]  Yee Whye Teh,et al.  On Exploration, Exploitation and Learning in Adaptive Importance Sampling , 2018, ArXiv.

[56]  Stuart J. Russell,et al.  Meta-Learning MCMC Proposals , 2017, NeurIPS.

[57]  Yee Whye Teh,et al.  Inference Trees: Adaptive Inference with Exploration , 2018, 1806.09550.

[58]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[59]  Jascha Sohl-Dickstein,et al.  Generalizing Hamiltonian Monte Carlo with Neural Networks , 2017, ICLR.

[60]  Dmitry P. Vetrov,et al.  Metropolis-Hastings view on variational inference and adversarial training , 2018, ArXiv.

[61]  David Silver,et al.  Credit Assignment Techniques in Stochastic Computation Graphs , 2019, AISTATS.

[62]  Yoshua Bengio,et al.  Probabilistic Planning with Sequential Monte Carlo methods , 2018, ICLR.

[63]  Prabhat,et al.  Etalumis: bringing probabilistic programming to scientific simulators at scale , 2019, SC.