论文信息 - Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid

Learning CPG Sensory Feedback with Policy Gradient for Biped Locomotion for a Full-Body Humanoid

This paper describes a learning framework for a central pattern generator based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve biped walking with a 3D hardware humanoid, and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feed-back controller can be acquired within a thousand trials by numerical simulations and the obtained controller in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluated walking velocity and stability. Furthermore, we present the possibility of an additional online learning using a hardware robot to improve the controller within 200 iterations.

[1] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[2] Judy A. Franklin,et al. Biped dynamic walking using reinforcement learning , 1997, Robotics Auton. Syst..

[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4] Shigenobu Kobayashi,et al. An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function , 1998, ICML.

[5] Nobutoshi Yamazaki,et al. Computer Simulation of the Ontogeny of Bipedal Walking , 1998 .

[6] Matthew M. Williamson,et al. Neural control of rhythmic arm movements , 1998, Neural Networks.

[7] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[8] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9] Kenji Doya,et al. Reinforcement Learning in Continuous Time and Space , 2000, Neural Computation.

[10] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[11] Tatsuzo Ishida,et al. Mechanical system of a small biped entertainment robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[12] Shin Ishii,et al. Reinforcement Learning for CPG-Driven Biped Robot , 2004, AAAI.

[13] H. Sebastian Seung,et al. Stochastic policy gradient reinforcement learning on a simple 3D biped , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[14] Kiyotoshi Matsuoka,et al. Sustained oscillations generated by mutually inhibiting neurons with adaptation , 1985, Biological Cybernetics.

[15] Tatsuzo Ishida,et al. Development of sensor system of a small biped entertainment robot , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.

[16] Aude Billard,et al. Biologically Inspired Adaptive Dynamic Walking of a Quadruped Robot , 2004 .

[17] Gentaro Taga,et al. A model of the neuro-musculo-skeletal system for human locomotion , 1995, Biological Cybernetics.

[18] Gentaro Taga,et al. A model of the neuro-musculo-skeletal system for human locomotion , 1995, Biological Cybernetics.

[19] Jun Morimoto,et al. Learning Sensory Feedback to CPG with Policy Gradient for Biped Locomotion , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[20] Jun Morimoto,et al. Experimental Studies of a Neural Oscillator for Biped Locomotion with QRIO , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[21] A. Cohen. Control Principles for Locomotion -Looking Toward Biology , 2006 .