Challenges for the policy representation when applying reinforcement learning in robotics

A summary of the state-of-the-art reinforcement learning in robotics is given, in terms of both algorithms and policy representations. Numerous challenges faced by the policy representation in robotics are identified. Two recent examples for application of reinforcement learning to robots are described: pancake flipping task and bipedal walking energy minimization task. In both examples, a state-of-the-art Expectation-Maximization-based reinforcement learning algorithm is used, but different policy representations are proposed and evaluated for each task. The two proposed policy representations offer viable solutions to four rarely-addressed challenges in policy representations: correlations, adaptability, multi-resolution, and globality. Both the successes and the practical difficulties encountered in these examples are discussed.

[1]  Anthony Jarc,et al.  Simplified and effective motor control based on muscle synergies to exploit musculoskeletal dynamics , 2009, Proceedings of the National Academy of Sciences.

[2]  Andrew W. Moore,et al.  The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces , 2004, Machine Learning.

[3]  Diego Esteban Pardo Ayala Learning rest-to-rest motor coordination in articulated mobile robots , 2009 .

[4]  Stefan Schaal,et al.  Reinforcement learning of full-body humanoid motor skills , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[5]  R. J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[6]  Darwin G. Caldwell,et al.  Imitation Learning of Positional and Force Skills Demonstrated via Kinesthetic Teaching and Haptic Input , 2011, Adv. Robotics.

[7]  Stefan Schaal,et al.  Dynamics systems vs. optimal control--a unifying view. , 2007, Progress in brain research.

[8]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[9]  Jun Morimoto,et al.  Learning Biped Locomotion , 2007, IEEE Robotics & Automation Magazine.

[10]  Stefan Schaal,et al.  A Generalized Path Integral Control Approach to Reinforcement Learning , 2010, J. Mach. Learn. Res..

[11]  Y. Wada,et al.  A reinforcement learning scheme for acquisition of via-point representation of human motion , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[12]  Nikolaos G. Tsagarakis,et al.  Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization , 2011, 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Darwin G. Caldwell,et al.  Upper-body kinesthetic teaching of a free-standing humanoid robot , 2011, 2011 IEEE International Conference on Robotics and Automation.

[14]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[15]  Darwin G. Caldwell,et al.  Robot motor skill coordination with EM-based Reinforcement Learning , 2010, 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[16]  Pieter Abbeel,et al.  Apprenticeship learning for helicopter control , 2009, CACM.

[17]  Andrey Bernstein,et al.  Adaptive-resolution reinforcement learning with polynomial exploration in deterministic domains , 2010, Machine Learning.

[18]  Jun Nakanishi,et al.  Trajectory formation for imitation with nonlinear dynamical systems , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[19]  Andreas Daffertshofer,et al.  The evolution of coordination during skill acquisition: The dynamical systems approach , 2004 .

[20]  Eric L. Sauser,et al.  An Approach Based on Hidden Markov Model and Gaussian Mixture Regression , 2010 .

[21]  Jun Morimoto,et al.  Reinforcement learning with via-point representation , 2004, Neural Networks.

[22]  Giorgio Metta,et al.  Learning the skill of archery by a humanoid robot iCub , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[23]  Stefan Schaal,et al.  Biologically-inspired dynamical systems for movement generation: Automatic real-time goal adaptation and obstacle avoidance , 2009, 2009 IEEE International Conference on Robotics and Automation.

[24]  Aude Billard,et al.  Reinforcement learning for imitating constrained reaching movements , 2007, Adv. Robotics.

[25]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[26]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[27]  Stefan Schaal,et al.  Robot Programming by Demonstration , 2009, Springer Handbook of Robotics.

[28]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[29]  Stefan Schaal,et al.  Skill learning and task outcome prediction for manipulation , 2011, 2011 IEEE International Conference on Robotics and Automation.

[30]  Brett Browning,et al.  A survey of robot learning from demonstration , 2009, Robotics Auton. Syst..

[31]  Darwin G. Caldwell,et al.  Learning and Reproduction of Gestures by Imitation , 2010, IEEE Robotics & Automation Magazine.

[32]  Barkan Ugurlu,et al.  Compliant joint modification and real-time dynamic walking implementation on bipedal robot cCub , 2011, 2011 IEEE International Conference on Mechatronics.

[33]  Jan Peters,et al.  Learning motor primitives for robotics , 2009, 2009 IEEE International Conference on Robotics and Automation.

[34]  Jens Kober Reinforcement Learning for Motor Primitives , 2008 .