Path integral reinforcement learning

Reinforcement learning is one of the most fundamental frameworks of learning control, but applying it to high dimensional control systems, e.g., humanoid robots, has largely been impossible so far. Among the key problems are that classical value function-based approaches run into severe limitations in continuous state-action spaces due to issues of function approximation of value functions, and, moreover, that the computational complexity and time of exploring high dimensional state-action spaces quickly exceeds practical feasibility. As an alternative, researchers have turned to trajectorybased reinforcement learning, which sacrifices global optimality in favor of being applicable to high-dimensional state-action spaces. Model-based approches, inspired by ideas of differential dynamic programming, have demonstrated some sucess if models are accurate, but model-free trajectory-based reinforcement learning has been limited by problems of slow learning and the need to tune many open parameters. In this paper, we review some recent developments of trajectory-based reinforcement learning using the framework of stochastic optimal control with path integrals. The path integral control approach transforms the optimal control problem into an estimation problem based on Monte-Carolo evaluations of a path integral. Based on this idea, a new reinforcement learning algorithm can be derived, called Policy Improvement with Path Integrals (PI2). PI2 is surprising simple and works as a black box learning system, i.e., without the need for manual parameter tuning. Moreover, it learns fast and efficiently in very high dimensional problems, as we demonstrate in a variety of robotic tasks. Interestingly, PI2 can be applied in model-free, hybrid, and model-based scenarios. Given its solid foundation in stochastic optimal control, path integral reinforcement learning offers a wide range of applications of reinforcement learning to very complex and new domains.

[1]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[2]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[3]  W. Fleming,et al.  Controlled Markov processes and viscosity solutions , 1992 .

[4]  Markos Papageorgiou,et al.  Stochastic Optimal Control of Moving Vehicles in a Dynamic Environment , 1994, Int. J. Robotics Res..

[5]  Geoffrey E. Hinton,et al.  Using EM for Reinforcement Learning , 2000 .

[6]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[7]  Weiwei Li,et al.  Hierarchical optimal control of redundant biomechanical systems , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[8]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[9]  H. Kappen Path integrals and symmetry breaking for optimal control theory , 2005, physics/0505066.

[10]  Emanuel Todorov,et al.  Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System , 2005, Neural Computation.

[11]  Stefan Schaal,et al.  Reinforcement Learning for Parameterized Motor Primitives , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[12]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[14]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[15]  Hilbert J. Kappen,et al.  Graphical Model Inference in Optimal Control of Stochastic Multi-Agent Systems , 2008, J. Artif. Intell. Res..

[16]  Reza Shadmehr,et al.  Motor Adaptation as a Process of Reoptimization , 2008, The Journal of Neuroscience.

[17]  Jan Peters,et al.  Machine Learning for motor skills in robotics , 2008, Künstliche Intell..

[18]  Stefan Schaal,et al.  Learning to Control in Operational Space , 2008, Int. J. Robotics Res..

[19]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[20]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[21]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[22]  Stefan Schaal,et al.  Variable Impedance Control - A Reinforcement Learning Approach , 2010, Robotics: Science and Systems.

[23]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[24]  Evangelos A. Theodorou,et al.  Iterative path integral stochastic optimal control: Theory and applications to motor control , 2011 .

[25]  Evangelos A. Theodorou,et al.  An iterative path integral stochastic optimal control approach for learning robotic tasks , 2011 .