论文信息 - High-order local dynamic programming

High-order local dynamic programming

We describe a new local dynamic programming algorithm for solving stochastic continuous Optimal Control problems. We use cubature integration to both propagate the state distribution and perform the Bellman backup. The algorithm can approximate the local policy and cost-to-go with arbitrary function bases. We compare the classic quadratic cost-to-go/linear-feedback controller to a cubic cost-to-go/quadratic policy controller on a 10-dimensional simulated swimming robot, and find that the higher order approximation yields a more general policy with a larger basin of attraction.

Yuval Tassa | Emanuel Todorov | Yuval Tassa | E. Todorov

[1] David Q. Mayne,et al. Differential dynamic programming , 1972, The Mathematical Gazette.

[2] A. Stroud. Approximate calculation of multiple integrals , 1973 .

[3] Rémi Coulom,et al. Reinforcement Learning Using Neural Networks, with Applications to Motor Control. (Apprentissage par renforcement utilisant des réseaux de neurones, avec des applications au contrôle moteur) , 2002 .

[4] E. Todorov. Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[5] David L. Darmofal,et al. Higher-Dimensional Integration with Gaussian Weight for Applications in Probabilistic Design , 2005, SIAM J. Sci. Comput..

[6] E. Todorov,et al. A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[7] William D. Smart,et al. Receding Horizon Differential Dynamic Programming , 2007, NIPS.

[8] S. Haykin,et al. Cubature Kalman Filters , 2009, IEEE Transactions on Automatic Control.

[9] Yuval Tassa,et al. Iterative local dynamic programming , 2009, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning.