Improving humanoid locomotive performance with learnt approximated dynamics via Gaussian processes for regression

We propose to improve the locomotive performance of humanoid robots by using approximated biped stepping and walking dynamics with reinforcement learning (RL). Although RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems - due to the large number of iterations required to acquire suitable policies. In this study, we first approximated the dynamics by using data from a real robot, and then applied the estimated dynamics in RL in order to improve stepping and walking policies. Gaussian processes were used to approximate the dynamics. By using Gaussian processes, we could estimate a probability distribution of a target function with a given covariance function. Thus, RL can take the uncertainty of the approximated dynamics into account throughout the learning process. We show that we can improve stepping and walking policies by using a RL method with the approximated models both in simulated and real environments. Experimental validation on a real humanoid robot of the proposed

[1]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[2]  Richard S. Sutton,et al.  Model-Based Reinforcement Learning with an Approximate, Learned Model , 1996 .

[3]  Jun Morimoto,et al.  Poincaré-Map-Based Reinforcement Learning For Biped Walking , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[4]  Fumio Miyazaki,et al.  Implementation of a Hierarchical Control for Biped Locomotion , 1981 .

[5]  Gordon Cheng,et al.  Passivity-Based Full-Body Force Control for Humanoids and Application to Dynamic Balancing and Locomotion , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[6]  Yoshihiko Nakamura,et al.  Whole-body Cooperative COG Control through ZMP Manipulation for Humanoid Robots , 2003 .

[7]  Jun Morimoto,et al.  Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid , 2005, AAAI.

[8]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[9]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[10]  H. Miura,et al.  Dynamical walk of biped locomotion , 1983 .

[11]  Carl E. Rasmussen,et al.  Gaussian Processes in Reinforcement Learning , 2003, NIPS.

[12]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[13]  Jun Morimoto,et al.  Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walking , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..

[14]  Stefan Schaal,et al.  Robot Learning From Demonstration , 1997, ICML.

[15]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[16]  Jun Morimoto,et al.  Learning Sensory Feedback to CPG with Policy Gradient for Biped Locomotion , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[17]  I. Shimoyama,et al.  Dynamic Walk of a Biped , 1984 .

[18]  B. Kendall Nonlinear Dynamics and Chaos , 2001 .

[19]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[20]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[21]  Christopher G. Atkeson,et al.  Nonparametric Model-Based Reinforcement Learning , 1997, NIPS.