Using Reward-weighted Regression for Reinforcement Learning of Task Space Control

Many robot control problems of practical importance, including task or operational space control, can be reformulated as immediate reward reinforcement learning problems. However, few of the known optimization or reinforcement learning algorithms can be used in online learning control for robots, as they are either prohibitively slow, do not scale to interesting domains of complex robots, or require trying out policies generated by random search, which are infeasible for a physical system. Using a generalization of the EM-base reinforcement learning framework suggested by Dayan & Hinton, we reduce the problem of learning with immediate rewards to a reward-weighted regression problem with an adaptive, integrated reward transformation for faster convergence. The resulting algorithm is efficient, learns smoothly without dangerous jumps in solution space, and works well in applications of complex high degree-of-freedom robots

[1]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[2]  Stefan Schaal,et al.  Learning inverse kinematics , 2001, Proceedings 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Expanding the Societal Role of Robotics in the the Next Millennium (Cat. No.01CH37180).

[3]  Jun Nakanishi,et al.  A unifying methodology for the control of robotic systems , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  Mitsuo Kawato,et al.  Multiple Paired Forward-Inverse Models for Human Motor Learning and Control , 1998, NIPS.

[5]  Oussama Khatib,et al.  A unified approach for motion and force control of robot manipulators: The operational space formulation , 1987, IEEE J. Robotics Autom..

[6]  Roy Featherstone,et al.  Robot Dynamics Algorithms , 1987 .

[7]  Stefan Schaal,et al.  Scalable Techniques from Nonparametric Statistics for Real Time Robot Learning , 2002, Applied Intelligence.

[8]  S. Grossberg,et al.  A Self-Organizing Neural Model of Motor Equivalent Reaching and Tool Use by a Multijoint Arm , 1993, Journal of Cognitive Neuroscience.

[9]  Michael I. Jordan,et al.  Forward Models: Supervised Learning with a Distal Teacher , 1992, Cogn. Sci..

[10]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[11]  Jun Nakanishi,et al.  Comparative experiments on task space control with redundancy resolution , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  James C. Spall,et al.  Introduction to Stochastic Search and Optimization. Estimation, Simulation, and Control (Spall, J.C. , 2007 .

[13]  M Kawato,et al.  Internal models for motor control. , 2007, Novartis Foundation symposium.

[14]  Alessandro De Luca,et al.  Learning control for redundant manipulators , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[15]  Geoffrey E. Hinton,et al.  Using Expectation-Maximization for Reinforcement Learning , 1997, Neural Computation.