Low-dimensional feature extraction for humanoid locomotion using kernel dimension reduction

We propose using the kernel dimension reduction (KDR) to extract a low-dimensional feature space for humanoid locomotion tasks. Although humanoids have many degrees of freedom, task relevant feature spaces can be much smaller than the number of dimension of the original state space. We consider an application of the proposed approach to improve the locomotive performance of humanoid robots using an extracted low-dimensional state space. To improve the locomotive performance, we use a reinforcement learning (RL) framework. While RL is a useful non-linear optimizer, it is usually difficult to apply RL to real robotic systems - due to the large number of iterations required to acquire suitable policies. In this study, we use the extracted low-dimensional feature space for RL so that the learning system can improve task performance quickly. The kernel dimension reduction method allows us to extract the feature space even if the task relevant mapping is non-linear. This is an essential property to improve humanoid locomotive performance since stepping or walking dynamics involves highly nonlinear dynamics. We show that we can improve stepping and walking policies by using a RL method on an extracted feature space by using KDR.

[1]  Jun Morimoto,et al.  CB: A Humanoid Research Platform for Exploring NeuroScience , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[2]  Jun Morimoto,et al.  Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning , 2000, Robotics Auton. Syst..

[3]  Michael I. Jordan,et al.  Dimensionality Reduction for Supervised Learning with Reproducing Kernel Hilbert Spaces , 2004, J. Mach. Learn. Res..

[4]  H. Miura,et al.  Dynamical walk of biped locomotion , 1983 .

[5]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[6]  Yoshihiko Nakamura,et al.  Whole-body Cooperative COG Control through ZMP Manipulation for Humanoid Robots , 2003 .

[7]  B. Kendall Nonlinear Dynamics and Chaos , 2001 .

[8]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[9]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[10]  Fumio Miyazaki,et al.  Implementation of a Hierarchical Control for Biped Locomotion , 1981 .

[11]  Gordon Cheng,et al.  Passivity-Based Full-Body Force Control for Humanoids and Application to Dynamic Balancing and Locomotion , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[12]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[13]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  Jun Morimoto,et al.  Poincaré-Map-Based Reinforcement Learning For Biped Walking , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[15]  Jun Morimoto,et al.  Learning CPG-based biped locomotion with a policy gradient method , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[16]  I. Shimoyama,et al.  Dynamic Walk of a Biped , 1984 .

[17]  Michael I. Jordan,et al.  Kernel dimension reduction in regression , 2009, 0908.1854.

[18]  Jun Morimoto,et al.  Modulation of simple sinusoidal patterns by a coupled oscillator model for biped walking , 2006, Proceedings 2006 IEEE International Conference on Robotics and Automation, 2006. ICRA 2006..