Continuous-time differential dynamic programming with terminal constraints

In this work, we revisit the continuous-time Differential Dynamic Programming (DDP) approach for solving optimal control problems with terminal state constraints. We derive two algorithms, each for different order of expansion of the system dynamics and we investigate their performance in terms of their convergence speed. Compared to previous work, we provide a set of backward differential equations for the value function expansion by relaxing the assumption that the initial nominal control must be very close to the optimal control solution. We apply the derived algorithms to two classical optimal control problems, namely, the inverted pendulum and the Dreyfus rocket problem and show the benefit of second order expansion.

[1]  Yuval Tassa,et al.  Control-limited differential dynamic programming , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[2]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[3]  William D. Smart,et al.  Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[4]  Jun Morimoto,et al.  Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[5]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[6]  C. Atkeson,et al.  Minimax differential dynamic programming: application to a biped walking robot , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).

[7]  Yuval Tassa,et al.  Infinite-Horizon Model Predictive Control for Periodic Tasks with Contacts , 2011, Robotics: Science and Systems.

[8]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[9]  Yuval Tassa,et al.  Synthesis and stabilization of complex behaviors through online trajectory optimization , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Yuval Tassa,et al.  Stochastic Differential Dynamic Programming , 2010, Proceedings of the 2010 American Control Conference.

[11]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[12]  Christopher G. Atkeson,et al.  Random Sampling of States in Dynamic Programming , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).