Learned graphical models for probabilistic planning provide a new class of movement primitives

Biological movement generation combines three interesting aspects: its modular organization in movement primitives (MPs), its characteristics of stochastic optimality under perturbations, and its efficiency in terms of learning. A common approach to motor skill learning is to endow the primitives with dynamical systems. Here, the parameters of the primitive indirectly define the shape of a reference trajectory. We propose an alternative MP representation based on probabilistic inference in learned graphical models with new and interesting properties that complies with salient features of biological movement control. Instead of endowing the primitives with dynamical systems, we propose to endow MPs with an intrinsic probabilistic planning system, integrating the power of stochastic optimal control (SOC) methods within a MP. The parameterization of the primitive is a graphical model that represents the dynamics and intrinsic cost function such that inference in this graphical model yields the control policy. We parameterize the intrinsic cost function using task-relevant features, such as the importance of passing through certain via-points. The system dynamics as well as intrinsic cost function parameters are learned in a reinforcement learning (RL) setting. We evaluate our approach on a complex 4-link balancing task. Our experiments show that our movement representation facilitates learning significantly and leads to better generalization to new task settings without re-learning.

[1]  Jan Peters,et al.  Learning complex motions by sequencing simpler motion templates , 2009, ICML '09.

[2]  Duy Nguyen-Tuong,et al.  Local Gaussian Process Regression for Real Time Online Model Learning , 2008, NIPS.

[3]  A. Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[4]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[5]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[6]  Gerhard Neumann,et al.  Stochastic Optimal Control Methods for Investigating the Power of Morphological Computation , 2013, Artificial Life.

[7]  Stefan Schaal,et al.  Learning and generalization of motor skills by learning from demonstration , 2009, 2009 IEEE International Conference on Robotics and Automation.

[8]  Steven M. LaValle,et al.  RRT-connect: An efficient approach to single-query path planning , 2000, Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No.00CH37065).

[9]  Petros Koumoutsakos,et al.  Reducing the Time Complexity of the Derandomized Evolution Strategy with Covariance Matrix Adaptation (CMA-ES) , 2003, Evolutionary Computation.

[10]  Verena Heidrich-Meisner,et al.  Neuroevolution strategies for episodic reinforcement learning , 2009, J. Algorithms.

[11]  Christian Igel,et al.  Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search , 2009, ICML '09.

[12]  Bernhard Schölkopf,et al.  Learning Inverse Dynamics: a Comparison , 2008, ESANN.

[13]  Marc Toussaint,et al.  Robot trajectory optimization using approximate inference , 2009, ICML '09.

[14]  Robert A. Jacobs,et al.  Properties of Synergies Arising from a Theory of Optimal Motor Behavior , 2006, Neural Computation.

[15]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[16]  Vicenç Gómez,et al.  Optimal control as a graphical model inference problem , 2009, Machine Learning.

[17]  H. Kappen An introduction to stochastic control theory, path integrals and reinforcement learning , 2007 .

[18]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[19]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[20]  Aude Billard,et al.  Learning Stable Nonlinear Dynamical Systems With Gaussian Mixture Models , 2011, IEEE Transactions on Robotics.

[21]  Emilio Bizzi,et al.  Combinations of muscle synergies in the construction of a natural motor behavior , 2003, Nature Neuroscience.

[22]  E. Bizzi,et al.  Article history: , 2005 .

[23]  Jan Peters,et al.  Noname manuscript No. (will be inserted by the editor) Policy Search for Motor Primitives in Robotics , 2022 .

[24]  Tom Schaul,et al.  Episodic Reinforcement Learning by Logistic Reward-Weighted Regression , 2008, ICANN.

[25]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[26]  Christopher G. Atkeson,et al.  Multiple balance strategies from one optimization criterion , 2007, 2007 7th IEEE-RAS International Conference on Humanoid Robots.

[27]  Jun Nakanishi,et al.  Learning Movement Primitives , 2005, ISRR.

[28]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.

[29]  M. Carpenter The Co-ordination and Regulation of Movements , 1968 .

[30]  Andrea d'Avella,et al.  Modularity for Sensorimotor Control: Evidence and a New Prediction , 2010, Journal of motor behavior.

[31]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[32]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[33]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[34]  Jun Nakanishi,et al.  A Unifying Methodology for Robot Control with Redundant DOFs , 2008 .

[35]  Sethu Vijayakumar,et al.  Adaptive Optimal Feedback Control with Learned Internal Dynamics Models , 2010, From Motor Learning to Interaction Learning in Robots.