论文信息 - Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot

Natural Policy Gradient Reinforcement Learning for a CPG Control of a Biped Robot

Motivated by the perspective that animals’ rhythmic movements such as locomotion are controlled by neural circuits called central pattern generators (CPGs), motor control mechanisms by CPG have been studied. As an autonomous learning framework for a CPG controller, we previously proposed a reinforcement learning (RL) method called the CPG-actor-critic method. In this article, we propose a natural policy gradient learning algorithm for the CPG-actor-critic method, and applied our RL to an automatic control problem by a biped robot simulator. Computer simulations show that our RL makes the biped robot walk stably on various terrain.

[1] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[2] Sham M. Kakade,et al. A Natural Policy Gradient , 2001, NIPS.

[3] S. Grillner,et al. Neuronal network generating locomotor behavior in lamprey: circuitry, transmitters, membrane properties, and simulation. , 1991, Annual review of neuroscience.

[4] Shin Ishii,et al. Reinforcement Learning Based on On-Line EM Algorithm , 1998, NIPS.

[5] Hiroshi Shimizu,et al. Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment , 1991, Biological Cybernetics.

[6] Steven J. Bradtke,et al. Linear Least-Squares algorithms for temporal difference learning , 2004, Machine Learning.

[7] Stefan Schaal,et al. Reinforcement Learning for Humanoid Robotics , 2003 .

[8] Michail G. Lagoudakis,et al. Least-Squares Methods in Reinforcement Learning for Control , 2002, SETN.

[9] John N. Tsitsiklis,et al. Actor-Critic Algorithms , 1999, NIPS.

[10] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..

[11] Shin Ishii,et al. Reinforcement Learning for Biped Locomotion , 2002, ICANN.