Differential dynamic programming with nonlinear constraints

Differential dynamic programming (DDP) is a widely used trajectory optimization technique that addresses nonlinear optimal control problems, and can readily handle nonlinear cost functions. However, it does not handle either state or control constraints. This paper presents a novel formulation of DDP that is able to accommodate arbitrary nonlinear inequality constraints on both state and control. The main insight in standard DDP is that a quadratic approximation of the value function can be derived using a recursive backward pass, however the recursive formulae are only valid for unconstrained problems. The main technical contribution of the presented method is a derivation of the recursive quadratic approximation formula in the presence of nonlinear constraints, after a set of active constraints has been identified at each point in time. This formula is used in a new Constrained-DDP (CDDP) algorithm that iteratively determines these active set and is guaranteed to converge toward a local minimum. CDDP is demonstrated on several underactuated optimal control problems up to 12D with obstacle avoidance and control constraints and is shown to outperform other methods for accommodating constraints.

[1]  D. Mayne A Second-order Gradient Method for Determining Optimal Trajectories of Non-linear Discrete-time Systems , 1966 .

[2]  S. Yakowitz,et al.  Constrained differential dynamic programming and its application to multireservoir control , 1979 .

[3]  S. Yakowitz The stagewise Kuhn-Tucker condition and differential dynamic programming , 1986 .

[4]  C. Hargraves,et al.  DIRECT TRAJECTORY OPTIMIZATION USING NONLINEAR PROGRAMMING AND COLLOCATION , 1987 .

[5]  Peter B. Luh,et al.  Hydroelectric generation scheduling with an effective differential dynamic programming algorithm , 1990 .

[6]  P. Luh,et al.  A Method for Constrained Dynamic Optimization Problems , 1990, 1990 American Control Conference.

[7]  S. LaValle Rapidly-exploring random trees : a new tool for path planning , 1998 .

[8]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[9]  Alberto Bemporad,et al.  The explicit linear quadratic regulator for constrained systems , 2003, Autom..

[10]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[13]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[14]  Scott Kuindersma,et al.  Derivative-free trajectory optimization with unscented dynamic programming , 2016, 2016 IEEE 55th Conference on Decision and Control (CDC).

[15]  Scott Kuindersma,et al.  Optimization and stabilization of trajectories for constrained dynamical systems , 2016, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[16]  Kris Hauser,et al.  Asymptotically Optimal Planning by Feasible Kinodynamic Planning in a State–Cost Space , 2015, IEEE Transactions on Robotics.