Optimal Control with Adaptive Internal Dynamics Models

Optimal feedback control has been proposed as an attractive movement generation strategy in goal reaching tasks for anthropomorphic manipulator systems. The optimal feedback control law for systems with non-linear dynamics and non-quadratic costs can be found by iterative methods, such as the iterative Linear Quadratic Gaussian (iLQG) algorithm. So far this framework relied on an analytic form of the system dynamics, which may often be unknown, difficult to estimate for more realisti c control systems or may be subject to frequent systematic changes. In this paper, we present a novel combination of learning a forward dynamics model within the iLQG framework. Utilising such adaptive internal models can compensate for complex dynamic perturbations of the controlled system in an online fashion . The specific adaptive framework introduced lends itself to a computationally more efficient implementation o f the iLQG optimisation without sacrificing control accuracy - allowing the method to scale to large DoF systems.

[1]  M. Ciletti,et al.  The computation and theory of optimal control , 1972 .

[2]  Daniel M. Wolpert,et al.  Making smooth moves , 2022 .

[3]  S. Scott Optimal feedback control and the neural basis of volitional motor control , 2004, Nature Reviews Neuroscience.

[4]  Pieter Abbeel,et al.  Using inaccurate models in reinforcement learning , 2006, ICML.

[5]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[6]  Emanuel Todorov,et al.  Iterative linearization methods for approximately optimal control and estimation of non-linear stochastic system , 2007, Int. J. Control.

[7]  F A Mussa-Ivaldi,et al.  Adaptive representation of dynamics during learning of a motor task , 1994, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[8]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[9]  Stefan Schaal,et al.  Incremental Online Learning in High Dimensions , 2005, Neural Computation.

[10]  Emanuel Todorov,et al.  Iterative Linear Quadratic Regulator Design for Nonlinear Biological Movement Systems , 2004, ICINCO.

[11]  Weiwei Li Optimal control for biological movement systems , 2006 .

[12]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[13]  Andrew W. Moore,et al.  Locally Weighted Learning for Control , 1997, Artificial Intelligence Review.

[14]  C. Atkeson Randomly Sampling Actions In Dynamic Programming , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[15]  Stefan Schaal,et al.  Learning tasks from a single demonstration , 1997, Proceedings of International Conference on Robotics and Automation.

[16]  Michael I. Jordan,et al.  A Minimal Intervention Principle for Coordinated Movement , 2002, NIPS.

[17]  E. Todorov,et al.  A generalized iterative LQG method for locally-optimal feedback control of constrained nonlinear stochastic systems , 2005, Proceedings of the 2005, American Control Conference, 2005..

[18]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[19]  T. Flash,et al.  The coordination of arm movements: an experimentally confirmed mathematical model , 1985, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[20]  宇野 洋二,et al.  Formation and control of optimal trajectory in human multijoint arm movement : minimum torque-change model , 1988 .

[21]  David Q. Mayne,et al.  Differential dynamic programming , 1972, The Mathematical Gazette.