Reinforced Variational Inference

Recent years have seen an increase in the complexity and scale of probabilistic models used to understand and analyze data, with a corresponding increase in the difficulty of performing inference. An important enabling factor in this context has been the development of stochastic gradient algorithms for learning variational approximations to posterior distributions. In a separate line of work researchers have been investigating how to use probabilistic inference for the problem of optimal control [5, 6, 12, 17, 25]. By viewing control as an inference problem, they showed that they could ‘borrow’ algorithms from the inference literature (e.g. belief propagation) and turn them into control algorithms. In this work, we do just the opposite: we formally map the problem of learning approximate posterior distributions in variational inference (VI) onto the policy optimization problem in reinforcement learning (RL), explaining this connection at two levels.

[1]  Brian Sallans,et al.  A Hierarchical Community of Experts , 1999, Learning in Graphical Models.

[2]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[3]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[4]  Daan Wierstra,et al.  Deep AutoRegressive Networks , 2013, ICML.

[5]  Michael I. Jordan,et al.  Variational Bayesian Inference with Stochastic Search , 2012, ICML.

[6]  Gerhard Neumann,et al.  Variational Inference for Policy Search in changing situations , 2011, ICML.

[7]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8]  Yuval Tassa,et al.  Learning Continuous Control Policies by Stochastic Value Gradients , 2015, NIPS.

[9]  Max Welling,et al.  Markov Chain Monte Carlo and Variational Inference: Bridging the Gap , 2014, ICML.

[10]  J. Andrew Bagnell,et al.  Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy , 2010 .

[11]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Peter L. Bartlett,et al.  Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..

[13]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[14]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.

[15]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[16]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[17]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[18]  John N. Tsitsiklis,et al.  Simulation-based optimization of Markov reward processes , 2001, IEEE Trans. Autom. Control..

[19]  Koray Kavukcuoglu,et al.  Multiple Object Recognition with Visual Attention , 2014, ICLR.

[20]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[21]  David Wingate,et al.  Automated Variational Inference in Probabilistic Programming , 2013, ArXiv.

[22]  Andrew W. Moore,et al.  Gradient Descent for General Reinforcement Learning , 1998, NIPS.

[23]  Peter W. Glynn,et al.  Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.

[24]  Jan Peters,et al.  A Survey on Policy Search for Robotics , 2013, Found. Trends Robotics.

[25]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[26]  Sean Gerrish,et al.  Black Box Variational Inference , 2013, AISTATS.

[27]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[28]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.