论文信息 - Stochastic Differential Dynamic Programming

Stochastic Differential Dynamic Programming

Although there has been a significant amount of work in the area of stochastic optimal control theory towards the development of new algorithms, the problem of how to control a stochastic nonlinear system remains an open research topic. Recent iterative linear quadratic optimal control methods iLQG handle control and state multiplicative noise while they are derived based on first order approximation of dynamics. On the other hand, methods such as Differential Dynamic Programming expand the dynamics up to the second order but so far they can handle nonlinear systems with additive noise. In this work we present a generalization of the classic Differential Dynamic Programming algorithm. We assume the existence of state and control multiplicative process noise, and proceed to derive the second-order expansion of the cost-to-go. We find the correction terms that arise from the stochastic assumption. Despite having quartic and cubic terms in the initial expression, we show that these vanish, leaving us with the same quadratic structure as standard DDP.

[1] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.

[2] S. Yakowitz. The stagewise Kuhn-Tucker condition and differential dynamic programming , 1986 .

[3] Manfred Morari,et al. Model predictive control: Theory and practice , 1988 .

[4] Jun Morimoto,et al. Minimax Differential Dynamic Programming: An Application to Robust Biped Walking , 2002, NIPS.

[5] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[6] Pieter Abbeel,et al. An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[7] Weiwei Li,et al. An Iterative Optimal Control and Estimation Design for Nonlinear Stochastic System , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[8] William D. Smart,et al. Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[9] G. Lantoine,et al. A Hybrid Differential Dynamic Programming Algorithm for Robust Low-Thrust Optimization , 2008 .