Control by Gradient Collocation: Applications to optimal obstacle avoidance and minimum torque control

We present a new machine learning algorithm for learning optimal feedback control policies to guide a robot to a goal in the presence of obstacles. Our method works by first reducing the problem of obstacle avoidance to a continuous state, action, and time control problem, and then uses efficient collocation methods to solve for an optimal feedback control policy. This formulation of the obstacle avoidance problem improves over standard approaches, such as potential field methods, by being resistant to local minima, allowing for moving obstacles, handling stochastic systems, and computing feedback control strategies that take into account the robot's (possibly non-linear) dynamics. In addition to contributing a new method for obstacle avoidance, our work contributes to the state-of-the-art in collocation methods for non-linear stochastic optimal control problems in two important ways: (1) we show that taking into account local gradient and second-order derivative information of the optimal value function at the collocation points allows us to exploit knowledge of the derivative information about the system dynamics, and (2) we show that computational savings can be achieved by directly fitting the gradient of the optimal value function rather than the optimal value function itself. We validate our approach on three problems: non-convex obstacle avoidance of a point-mass robot, obstacle avoidance for a 2 degree of freedom robotic manipulator, and optimal control of a non-linear dynamical system.

[1]  Alex Simpkins,et al.  Practical numerical methods for stochastic optimal control of biological systems in continuous time and space , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[2]  Zvi Shiller,et al.  Optimal obstacle avoidance based on the Hamilton-Jacobi-Bellman equation , 1994, IEEE Trans. Robotics Autom..

[3]  Yuval Tassa,et al.  High-order local dynamic programming , 2011, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL).

[4]  Yuval Tassa,et al.  Iterative local dynamic programming , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.

[5]  Oussama Khatib,et al.  Real-Time Obstacle Avoidance for Manipulators and Mobile Robots , 1986 .

[6]  Dinesh Manocha,et al.  Global vector field computation for feedback motion planning , 2009, 2009 IEEE International Conference on Robotics and Automation.

[7]  Steven M. LaValle,et al.  Simple and Efficient Algorithms for Computing Smooth, Collision-free Feedback Laws Over Given Cell Decompositions , 2009, Int. J. Robotics Res..

[8]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[9]  H. Alwardi,et al.  An adaptive least-squares collocation radial basis function method for the HJB equation , 2012, J. Glob. Optim..

[10]  Steven M. LaValle,et al.  Algorithms for Computing Numerical Optimal Feedback Motion Strategies , 2001, Int. J. Robotics Res..

[11]  I.I. Hussein,et al.  Real Time Feedback Control for Nonholonomic Mobile Robots With Obstacles , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[12]  Song Wang,et al.  A radial basis collocation method for Hamilton-Jacobi-Bellman equations , 2006, Autom..

[13]  E. Kansa MULTIQUADRICS--A SCATTERED DATA APPROXIMATION SCHEME WITH APPLICATIONS TO COMPUTATIONAL FLUID-DYNAMICS-- II SOLUTIONS TO PARABOLIC, HYPERBOLIC AND ELLIPTIC PARTIAL DIFFERENTIAL EQUATIONS , 1990 .

[14]  Daniel E. Koditschek,et al.  Exact robot navigation using artificial potential functions , 1992, IEEE Trans. Robotics Autom..

[15]  Jean-Claude Latombe,et al.  Robot Motion Planning: A Distributed Representation Approach , 1991, Int. J. Robotics Res..

[16]  S. LaValle,et al.  Smoothly Blending Vector Fields for Global Robot Navigation , 2005, Proceedings of the 44th IEEE Conference on Decision and Control.