Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid

This paper describes a learning framework for a central pattern generator based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve biped walking with a 3D hardware humanoid, and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feed-back controller can be acquired within a thousand trials by numerical simulations and the obtained controller in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluated walking velocity and stability. Furthermore, we present the possibility of an additional online learning using a hardware robot to improve the controller within 200 iterations.

[1]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[2]  Judy A. Franklin,et al.  Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Shigenobu Kobayashi,et al.  An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[5]  Nobutoshi Yamazaki,et al.  Computer Simulation of the Ontogeny of Bipedal Walking , 1998 .

[6]  Matthew M. Williamson,et al.  Neural control of rhythmic arm movements , 1998, Neural Networks.

[7]  John N. Tsitsiklis,et al.  Actor-Critic Algorithms , 1999, NIPS.

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  Kenji Doya,et al.  Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[10]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[11]  Tatsuzo Ishida,et al.  Mechanical system of a small biped entertainment robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[12]  Shin Ishii,et al.  Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[13]  H. Sebastian Seung,et al.  Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14]  Kiyotoshi Matsuoka,et al.  Sustained oscillations generated by mutually inhibiting neurons with adaptation , 1985, Biological Cybernetics.

[15]  Tatsuzo Ishida,et al.  Development of sensor system of a small biped entertainment robot , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16]  Aude Billard,et al.  Biologically Inspired Adaptive Dynamic Walking of a Quadruped Robot , 2004 .

[17]  Gentaro Taga,et al.  A model of the neuro-musculo-skeletal system for human locomotion , 1995, Biological Cybernetics.

[18]  Gentaro Taga,et al.  A model of the neuro-musculo-skeletal system for human locomotion , 1995, Biological Cybernetics.

[19]  Jun Morimoto,et al.  Learning Sensory Feedback to CPG with Policy Gradient for Biped Locomotion , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[20]  Jun Morimoto,et al.  Experimental Studies of a Neural Oscillator for Biped Locomotion with QRIO , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[21]  A. Cohen Control Principles for Locomotion -Looking Toward Biology , 2006 .