Compositionality of optimal control laws

We present a theory of compositionality in stochastic optimal control, showing how task-optimal controllers can be constructed from certain primitives. The primitives are themselves feedback controllers pursuing their own agendas. They are mixed in proportion to how much progress they are making towards their agendas and how compatible their agendas are with the present task. The resulting composite control law is provably optimal when the problem belongs to a certain class. This class is rather general and yet has a number of unique properties - one of which is that the Bellman equation can be made linear even for non-linear or discrete dynamics. This gives rise to the compositionality developed here. In the special case of linear dynamics and Gaussian noise our framework yields analytical solutions (i.e. non-linear mixtures of LQG controllers) without requiring the final cost to be quadratic. More generally, a natural set of control primitives can be constructed by applying SVD to Green's function of the Bellman equation. We illustrate the theory in the context of human arm movements. The ideas of optimality and compositionality are both very prominent in the field of motor control, yet they have been difficult to reconcile. Our work makes this possible.

[1]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[2]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[3]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[4]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[5]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[6]  Tyrone E. Duncan,et al.  Numerical Methods for Stochastic Control Problems in Continuous Time (Harold J. Kushner and Paul G. Dupuis) , 1994, SIAM Rev..

[7]  Emilio Bizzi,et al.  Combinations of muscle synergies in the construction of a natural motor behavior , 2003, Nature Neuroscience.

[8]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[9]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[10]  Sanjoy K. Mitter,et al.  A Variational Approach to Nonlinear Estimation , 2003, SIAM J. Control. Optim..

[11]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[12]  Frédo Durand,et al.  Linear Bellman combination for control of character animation , 2009, ACM Trans. Graph..

[13]  Sridhar Mahadevan,et al.  Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes , 2007, J. Mach. Learn. Res..

[14]  Frédo Durand,et al.  Linear Bellman combination for control of character animation , 2009, SIGGRAPH 2009.

[15]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[16]  J. Brobeck The Integrative Action of the Nervous System , 1948, The Yale Journal of Biology and Medicine.

[17]  Daniel M. Wolpert,et al.  Making smooth moves , 2022 .

[18]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[19]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[20]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[21]  J. F. Soechting,et al.  Postural Hand Synergies for Tool Use , 1998, The Journal of Neuroscience.

[22]  E. Bizzi,et al.  The construction of movement by the spinal cord , 1999, Nature Neuroscience.