A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems

We present an iterative linear-quadratic-Gaussian method for locally-optimal feedback control of nonlinear stochastic systems subject to control constraints. Previously, similar methods have been restricted to deterministic unconstrained problems with quadratic costs. The new method constructs an affine feedback control law, obtained by minimizing a novel quadratic approximation to the optimal cost-to-go function. Global convergence is guaranteed through a Levenberg-Marquardt method; convergence in the vicinity of a local minimum is quadratic. Performance is illustrated on a limited-torque inverted pendulum problem, as well as a complex biomechanical control problem involving a stochastic model of the human arm, with 10 state dimensions and 6 muscle actuators. A Matlab implementation of the new algorithm is availabe at www.cogsci.ucsd.edu//spl sim/todorov.

[1]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[2]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.

[3]  J. Pantoja,et al.  Differential dynamic programming and Newton's method , 1988 .

[4]  L. Liao,et al.  Convergence in unconstrained discrete-time differential dynamic programming , 1991 .

[5]  L. Liao,et al.  Advantages of Differential Dynamic Programming Over Newton''s Method for Discrete-time Optimal Control Problems , 1992 .

[6]  O. V. Stryk,et al.  Numerical Solution of Optimal Control Problems by Direct Collocation , 1993 .

[7]  Tyrone E. Duncan,et al.  Numerical Methods for Stochastic Control Problems in Continuous Time (Harold J. Kushner and Paul G. Dupuis) , 1994, SIAM Rev..

[8]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[9]  Daniel M. Wolpert,et al.  Signal-dependent noise determines motor planning , 1998, Nature.

[10]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[11]  Jun Morimoto,et al.  Nonparametric Representation of Policies and Value Functions: A Trajectory-Based Approach , 2002, NIPS.

[12]  Duan Li,et al.  A Globally Convergent and Efficient Method for Unconstrained Discrete-Time Optimal Control , 2002, J. Glob. Optim..

[13]  Michael I. Jordan,et al.  Optimal feedback control as a theory of motor coordination , 2002, Nature Neuroscience.

[14]  Emanuel Todorov,et al.  Optimal control methods suitable for biomechanical systems , 2003, Proceedings of the 25th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (IEEE Cat. No.03CH37439).

[15]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[16]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[17]  E. J. Cheng,et al.  Measured and modeled properties of mammalian skeletal muscle. II. The effectsof stimulus frequency on force-length and force-velocity relationships , 1999, Journal of Muscle Research & Cell Motility.

[18]  Konrad Paul Körding,et al.  The loss function of sensorimotor learning. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Emanuel Todorov,et al.  Stochastic Optimal Control and Estimation Methods Adapted to the Noise Characteristics of the Sensorimotor System , 2005, Neural Computation.