Convergence Analysis of Reinforcement Learning Approaches to Humanoid Locomotion

Sophisticated intelligent machines such as humanoid robots require the ability to interact with the environment and hence efficiently adapt their behavior. Therefore, robots must be equipped with the ability to modify and add to its knowledge base using information gained from its past behaviour, such as stable, robust walking on unseen terrains. Currently, designing humanoid robots with advanced learning and cognitive capabilities is one of the most challenging issues in the field of intelligent robotics. The iCub and its newer version, the C-Cub, were developed as test beds for evaluating how cognitive and learning approaches can operate safely in unstructured environments. This paper describes preliminary work on evaluating the convergence of a variety of temporal difference learning algorithms, and comparing the results of each learning algorithm based on a simulation of a simple inverted pendulum in order to visualize the value and control action functions. It will be clearly showed that the learning performance of TD(λ) is significantly better than the TD(0) and stochastic gradient algorithm (SGA) based learning.

[1]  Shin Ishii,et al.  Part 4: Reinforcement learning: Machine learning and natural learning , 2006, New Generation Computing.

[2]  Shalabh Bhatnagar,et al.  Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation , 2009, NIPS.

[3]  Bernard Espiau,et al.  A Study of the Passive Gait of a Compass-Like Biped Robot , 1998, Int. J. Robotics Res..

[4]  P. Dayan,et al.  Dopamine, uncertainty and TD learning , 2005, Behavioral and Brain Functions.

[5]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[6]  G. Sandini,et al.  The iCub cognitive architecture: Interactive development in a humanoid robot , 2007, 2007 IEEE 6th International Conference on Development and Learning.

[7]  Jun Morimoto,et al.  Learning CPG-based biped locomotion with a policy gradient method , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[8]  Jun Morimoto,et al.  Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[9]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[10]  Richard S. Sutton,et al.  A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation , 2008, NIPS.

[11]  Russell L. Tedrake,et al.  Applied optimal control for dynamically stable legged locomotion , 2004 .

[12]  A. V. Lensky,et al.  Dynamic Walking of a Vehicle With Two Telescopic Legs Controlled by Two Drives , 1994, Int. J. Robotics Res..

[13]  Andrew G. Barto,et al.  Reinforcement learning , 1998 .

[14]  Robert Babuska,et al.  Reinforcement Learning Control for Biped Robot Walking on Uneven Surfaces , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[15]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[16]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Jun Morimoto,et al.  Learning CPG-based biped locomotion with a policy gradient method , 2005, Humanoids.

[18]  Shin Ishii,et al.  Reinforcement Learning for Biped Locomotion , 2002, ICANN.

[19]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[20]  Kagan Tumer,et al.  Unifying temporal and structural credit assignment problems , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[21]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[22]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[23]  Yasuhisa Hasegawa,et al.  Self scaling reinforcement learning for fuzzy logic controller , 1996, Proceedings of IEEE International Conference on Evolutionary Computation.

[24]  Vijaykumar Gullapalli,et al.  A stochastic reinforcement learning algorithm for learning real-valued functions , 1990, Neural Networks.