Efficient computation of optimal actions

Optimal choice of actions is a fundamental problem relevant to fields as diverse as neuroscience, psychology, economics, computer science, and control engineering. Despite this broad relevance the abstract setting is similar: we have an agent choosing actions over time, an uncertain dynamical system whose state is affected by those actions, and a performance criterion that the agent seeks to optimize. Solving problems of this kind remains hard, in part, because of overly generic formulations. Here, we propose a more structured formulation that greatly simplifies the construction of optimal control laws in both discrete and continuous domains. An exhaustive search over actions is avoided and the problem becomes linear. This yields algorithms that outperform Dynamic Programming and Reinforcement Learning, and thereby solve traditional problems more efficiently. Our framework also enables computations that were not possible before: composing optimal control laws by mixing primitives, applying deterministic methods to stochastic systems, quantifying the benefits of error tolerance, and inferring goals from behavioral data via convex optimization. Development of a general class of easily solvable problems tends to accelerate progress—as linear systems theory has done, for example. Our framework may have similar impact in fields where optimal choice of actions is relevant.

[1]  J. Meditch,et al.  Applied optimal control , 1972, IEEE Transactions on Automatic Control.

[2]  S. Mitter,et al.  Optimal control and nonlinear filtering for nondegenerate diffusion processes , 1982 .

[3]  Robert F. Stengel,et al.  Optimal Control and Estimation , 1994 .

[4]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[5]  R. Korn Optimal Portfolios: Stochastic Models For Optimal Investment And Risk Management In Continuous Time , 1997 .

[6]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[7]  Sanjoy K. Mitter,et al.  A Variational Approach to Nonlinear Estimation , 2003, SIAM J. Control. Optim..

[8]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[9]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[10]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[11]  C. Karen Liu,et al.  Learning physics-based motion style with nonlinear inverse optimization , 2005, ACM Trans. Graph..

[12]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[13]  Emanuel Todorov,et al.  Linearly-solvable Markov decision problems , 2006, NIPS.

[14]  Emanuel Todorov,et al.  General duality between optimal control and estimation , 2008, 2008 47th IEEE Conference on Decision and Control.

[15]  Emanuel Todorov,et al.  Compositionality of optimal control laws , 2009, NIPS.