Applying statistical generalization to determine search direction for reinforcement learning of movement primitives

In this paper we present a new methodology for robot learning that combines ideas from statistical generalization and reinforcement learning. First we apply statistical generalization to compute an approximation for the optimal control policy as defined by training movements that solve the given task in a number of specific situations. This way we obtain a manifold of movements, which dimensionality is usually much smaller than the dimensionality of a full space of movement primitives. Next we refine the policy by means of reinforcement learning on the approximating manifold, which results in a learning problem constrained to the low dimensional manifold. We show that in some situations, learning on the low dimensional manifold can be implemented as an error learning algorithm. We apply golden section search to refine the control policy. Furthermore, we propose a reinforcement learning algorithm with an extended parameter set, which combines learning in constrained domain with learning in full space of parametric movement primitives, which makes it possible to explore actions outside of the initial approximating manifold. The proposed approach was tested for learning of pouring action both in simulation and on a real robot.

[1]  D. Wolpert,et al.  Principles of sensorimotor learning , 2011, Nature Reviews Neuroscience.

[2]  Stefan Schaal,et al.  Is imitation learning the route to humanoid robots? , 1999, Trends in Cognitive Sciences.

[3]  Ales Ude,et al.  Exploiting previous experience to constrain robot sensorimotor learning , 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots.

[4]  Aude Billard,et al.  Donut as I do: Learning from failed demonstrations , 2011, 2011 IEEE International Conference on Robotics and Automation.

[5]  Ales Ude,et al.  Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives , 2011, Robotics Auton. Syst..

[6]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7]  Jun Morimoto,et al.  On-line motion synthesis and adaptation using a trajectory database , 2012, Robotics Auton. Syst..

[8]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[9]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .

[10]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[11]  Giorgio Metta,et al.  Learning the skill of archery by a humanoid robot iCub , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[12]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[13]  Ales Ude,et al.  The Karlsruhe Humanoid Head , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[14]  Sham M. Kakade,et al.  A Natural Policy Gradient , 2001, NIPS.

[15]  Will Tribbey,et al.  Numerical Recipes: The Art of Scientific Computing (3rd Edition) is written by William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery, and published by Cambridge University Press, © 2007, hardback, ISBN 978-0-521-88068-8, 1235 pp. , 1987, SOEN.

[16]  Gregor Schöner,et al.  The uncontrolled manifold concept: identifying control variables for a functional task , 1999, Experimental Brain Research.

[17]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[18]  Jun Morimoto,et al.  Task-Specific Generalization of Discrete and Periodic Dynamic Movement Primitives , 2010, IEEE Transactions on Robotics.

[19]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[20]  Jan Peters,et al.  Nonamemanuscript No. (will be inserted by the editor) Reinforcement Learning to Adjust Parametrized Motor Primitives to , 2011 .

[21]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[22]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[23]  Andrew W. Moore,et al.  Locally Weighted Learning , 1997, Artificial Intelligence Review.