Policy Search for Motor Primitives in Robotics

Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM™ robot arm.

[1]  Donald E. Kirk,et al.  Optimal Control Theory , 1970 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Richard S. Sutton,et al.  Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[4]  Christopher G. Atkeson,et al.  Using Local Trajectory Optimizers to Speed Up Global Optimization in Dynamic Programming , 1993, NIPS.

[5]  C. Sumners Toys in Space: Exploring Science with the Astronauts , 1993 .

[6]  Yasuhiro Masutani,et al.  Mastering of a Task with Interaction between a Robot and Its Environment. "Kendama" Task. , 1993 .

[7]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[8]  Mitsuo Kawato,et al.  Teaching by Showing in Kendama Based on Optimization Principle , 1994 .

[9]  V. Gullapalli,et al.  Acquiring robot skills via reinforcement learning , 1994, IEEE Control Systems.

[10]  S. Schaal,et al.  A Kendama Learning Robot Based on Bi-directional Theory , 1996, Neural Networks.

[11]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[12]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[15]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[16]  Michael I. Jordan,et al.  PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[17]  Jürgen Schmidhuber,et al.  Gradient-based Reinforcement Planning in Policy-Search Methods , 2001, ArXiv.

[18]  Rogelio Lozano,et al.  Non-linear Control for Underactuated Mechanical Systems , 2001 .

[19]  Andrew W. Moore,et al.  Direct Policy Search using Paired Statistical Tests , 2001, ICML.

[20]  Jun Nakanishi,et al.  Learning Attractor Landscapes for Learning Motor Primitives , 2002, NIPS.

[21]  Jun Nakanishi,et al.  Movement imitation with nonlinear dynamical systems in humanoid robots , 2002, Proceedings 2002 IEEE International Conference on Robotics and Automation (Cat. No.02CH37292).

[22]  Leslie Pack Kaelbling,et al.  Reinforcement Learning by Policy Search , 2002 .

[23]  Jun Nakanishi,et al.  Control, Planning, Learning, and Imitation with Dynamic Movement Primitives , 2003 .

[24]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[25]  Jeff G. Schneider,et al.  Covariant Policy Search , 2003, IJCAI.

[26]  Hagai Attias,et al.  Planning by Probabilistic Inference , 2003, AISTATS.

[27]  Jeff G. Schneider,et al.  Policy Search by Dynamic Programming , 2003, NIPS.

[28]  Noah J. Cowan,et al.  Efficient Gradient Estimation for Motor Control Learning , 2002, UAI.

[29]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[30]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[31]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[32]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[33]  Ben Tse,et al.  Autonomous Inverted Helicopter Flight via Reinforcement Learning , 2004, ISER.

[34]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[35]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[36]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[37]  Marc Carreras,et al.  Towards Direct Policy Search Reinforcement Learning for Robot Control , 2005, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[38]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[39]  Stefan Schaal,et al.  Reinforcement learning by reward-weighted regression for operational space control , 2007, ICML '07.

[40]  Nando de Freitas,et al.  Bayesian Policy Learning with Trans-Dimensional MCMC , 2007, NIPS.

[41]  Shimon Whiteson,et al.  Transfer via inter-task mappings in policy search reinforcement learning , 2007, AAMAS '07.

[42]  G. Wulf,et al.  Attention and Motor Skill Learning , 2007 .

[43]  Marc Toussaint,et al.  Probabilistic inference for structured planning in robotics , 2007, 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[44]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[45]  Jürgen Schmidhuber,et al.  State-Dependent Exploration for Policy Gradient Methods , 2008, ECML/PKDD.

[46]  Jan Peters,et al.  Using Bayesian Dynamical Systems for Motion Template Libraries , 2008, NIPS.

[47]  Stefan Schaal,et al.  Movement reproduction and obstacle avoidance with dynamic movement primitives and potential fields , 2008, Humanoids 2008 - 8th IEEE-RAS International Conference on Humanoid Robots.

[48]  Betty J. Mohler,et al.  Learning perceptual coupling for motor primitives , 2008, 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[49]  Jan Peters,et al.  Machine Learning for motor skills in robotics , 2008, Künstliche Intell..

[50]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[51]  Javier de Lope,et al.  The kNN-TD Reinforcement Learning Algorithm , 2009 .

[52]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[53]  Marc Toussaint,et al.  Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[54]  Frank Sehnke,et al.  Parameter-exploring policy gradients , 2010, Neural Networks.