On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference

We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also demonstrating that the formalism leads to novel practical approaches to the control problem. Specifically, a natural relaxation of the dual formulation gives rise to exact iter- ative solutions to the finite and infinite horizon stochastic optimal control problem, while direct application of Bayesian inference methods yields instances of risk sensitive control. We furthermore study corresponding formulations in the reinforcement learning setting and present model free algorithms for problems with both discrete and continuous state and action spaces. Evaluation of the proposed methods on the standard Gridworld and Cart-Pole benchmarks verifies the theoretical insights and shows that the proposed methods improve upon current approaches.

[1]  H. K. Moulton,et al.  Report , 1927, Datenschutz und Datensicherheit - DuD.

[2]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[3]  Michael I. Jordan,et al.  Reinforcement Learning by Probability Matching , 1995, NIPS 1995.

[4]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[5]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[7]  Richard S. Sutton,et al.  Dimensions of Reinforcement Learning , 1998 .

[8]  Geoffrey E. Hinton,et al.  Using EM for Reinforcement Learning , 2000 .

[9]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[10]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[11]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[12]  William J. Byrne,et al.  Convergence Theorems for Generalized Alternating Minimization Procedures , 2005, J. Mach. Learn. Res..

[13]  Marc Toussaint,et al.  Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[14]  Weiwei Li,et al.  An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[15]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[16]  Martin A. Riedmiller,et al.  Evaluation of Policy Gradient Methods and Variants on the Cart-Pole Benchmark , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[17]  Sethu Vijayakumar,et al.  Adaptive Optimal Control for Redundantly Actuated Arms , 2008, SAB.

[18]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[19]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[20]  Takamitsu Matsubara,et al.  Optimal Feedback Control for anthropomorphic manipulators , 2010, 2010 IEEE International Conference on Robotics and Automation.

[21]  Stefan Schaal,et al.  Learning Policy Improvements with Path Integrals , 2010, AISTATS.

[22]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[23]  Marc Toussaint,et al.  An Approximate Inference Approach to Temporal Optimization in Optimal Control , 2010, NIPS.

[24]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[25]  Jun Nakanishi,et al.  Stiffness and temporal optimization in periodic movements: An optimal control approach , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[26]  Sethu Vijayakumar,et al.  Exploiting Variable Stiffness in Explosive Movement Tasks , 2011, Robotics: Science and Systems.

[27]  Vicenç Gómez,et al.  Dynamic Policy Programming with Function Approximation , 2011, AISTATS.

[28]  Taku Komura,et al.  Hierarchical Motion Planning in Topological Representations , 2012, Robotics: Science and Systems.

[29]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[30]  N. Roy,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2013 .