On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference (Extended Abstract)

We present a reformulation of the stochastic optimal control problem in terms of KL divergence minimisation, not only providing a unifying perspective of previous approaches in this area, but also demonstrating that the formalism leads to novel practical approaches to the control problem. Specifically, a natural relaxation of the dual formulation gives rise to exact iterative solutions to the finite and infinite horizon stochastic optimal control problem, while direct application of Bayesian inference methods yields instances of risk sensitive control.

[1]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[2]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[3]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[4]  Weiwei Li,et al.  An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[7]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[8]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[9]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[10]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[11]  Richard S. Sutton,et al.  Reinforcement Learning , 1992, Handbook of Machine Learning.

[12]  Yasemin Altun,et al.  Relative Entropy Policy Search , 2010 .

[13]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[14]  Vicenç Gómez,et al.  Dynamic Policy Programming with Function Approximation , 2011, AISTATS.

[15]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[16]  Marc Toussaint,et al.  On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , 2012, Robotics: Science and Systems.