Generation of locally optimal trajectories against moving obstacles using Gaussian sampling

Differential Dynamic Programming (DDP) can effectively solve an optimal control problem; however, it cannot deal with temporally changing environments such as an appearance of a moving obstacle. In this paper, we present the segmentation of locally optimal trajectories under an environment with a moving obstacle. The agent finds locally optimal trajectories by sampling Gaussian samples according to its existing incomplete policy. After one episode of the agent movement is over and if this episode performs better than the previous one, the policy is reinforced by learning the trajectories of the episode. We show that the algorithm successfully generates the locally optimal trajectories to avoid moving obstacles, and the performance of the resulting policy is improved as the episode progresses. These results would help apply reinforcement learning to robotics in two respects: learning the policy with a small number of iteration by reusing DDP policy, and taking action against changing environments. Because a real-world robot has to deal with a variant environment and has a limit of iterations for policy learning, these two results would help to settle the reinforcement learning issues for robotics.

[1]  John M. Hollerbach,et al.  Planning of Minimum- Time Trajectories for Robot Arms , 1986 .

[2]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1985, Autonomous Robot Vehicles.

[3]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[4]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[5]  Jan Peters,et al.  Reinforcement Learning to Adjust Robot Movements to New Situations , 2010, IJCAI.

[6]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[7]  Sergey Levine,et al.  Guided Policy Search , 2013, ICML.

[8]  Jan Peters,et al.  Reinforcement learning in robotics: A survey , 2013, Int. J. Robotics Res..

[9]  Sergey Levine,et al.  Variational Policy Search via Trajectory Optimization , 2013, NIPS.