Acquiring of walking behavior for four-legged robots using actor-critic method based on policy gradient

Actor-critic method based on policy gradient is an effective method for problems with high-dimensional learning space. In walking behavior of robots, the movement of robot's joint is cyclic. For this purpose, we apply closed-loops of B-spline curve to cyclic movement. Hence, we propose a method combined actor-critic method based on policy gradient with B-spline curve. In this method, B-spline curve uses for controlling target such as joints in robot on continuous space. By using B-spline curve, the target for control can move smoothly and flexible. In this paper, we applied this method to a four-legged robot with 12 joints and simulated on the Open Dynamics Engine (ODE) which is a dynamics simulation software. The simulation results show that robots succeeded in getting walking behavior.