论文信息 - Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient

Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient

Actor-critic method based on policy gradient is an effective method for problems with high-dimensional learning space. In walking behavior of robots, the movement of robot's joint is cyclic. For this purpose, we apply closed-loops of B-spline curve to cyclic movement. Hence, we propose a method combined actor-critic method based on policy gradient with B-spline curve. In this method, B-spline curve uses for controlling target such as joints in robot on continuous space. By using B-spline curve, the target for control can move smoothly and flexible. In this paper, we applied this method to a four-legged robot with 12 joints and simulated on the Open Dynamics Engine (ODE) which is a dynamics simulation software. The simulation results show that robots succeeded in getting walking behavior.

Hajime Igarashi | Kota Watanabe | Ryo Inoue

[1] Wang Zhan-quan. Reinforcement Learning Theory,Algorithms and Application , 2006 .

[2] Jun Morimoto,et al. Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[3] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[4] Ralf Der,et al. A Sensor-Based Learning Algorithm for the Self-Organization of Robot Behavior , 2009, Algorithms.

[5] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[6] Jun Morimoto,et al. Learning CPG-based biped locomotion with a policy gradient method , 2005, 5th IEEE-RAS International Conference on Humanoid Robots, 2005..

[7] Vijay R. Konda,et al. OnActor-Critic Algorithms , 2003, SIAM J. Control. Optim..