A Unifying Framework for Linearly Solvable Control

Recent work has led to the development of an elegant theory of Linearly Solvable Markov Decision Processes (LMDPs) and related Path-Integral Control Problems. Traditionally, MDPs have been formulated using stochastic policies and a control cost based on the KL divergence. In this paper, we extend this framework to a more general class of divergences: the Renyi divergences. These are a more general class of divergences parameterized by a continuous parameter that include the KL divergence as a special case. The resulting control problems can be interpreted as solving a risk-sensitive version of the LMDP problem. For a > 0, we get risk-averse behavior (the degree of risk-aversion increases with a) and for a 0. This work generalizes the recently developed risk-sensitive path-integral control formalism which can be seen as the continuous-time limit of results obtained in this paper. To the best of our knowledge, this is a general theory of linearly solvable control and includes all previous work as a special case. We also present an alternative interpretation of these results as solving a 2-player (cooperative or competitive) Markov Game. From the linearity follow a number of nice properties including compositionality of control laws and a path-integral representation of the value function. We demonstrate the usefulness of the framework on control problems with noise where different values of lead to qualitatively different control behaviors.

[1]  Rhodes,et al.  Optimal stochastic linear systems with exponential performance criteria and their relation to deterministic differential games , 1973 .

[2]  B. Øksendal Stochastic differential equations : an introduction with applications , 1987 .

[3]  W. Fleming,et al.  Risk sensitive optimal control and differential games , 1992 .

[4]  T. Basar,et al.  H∞-0ptimal Control and Related Minimax Design Problems: A Dynamic Game Approach , 1996, IEEE Trans. Autom. Control..

[5]  Daniel Hernández-Hernández,et al.  Risk Sensitive Markov Decision Processes , 1997 .

[6]  D. Serre Matrices: Theory and Applications , 2002 .

[7]  H. Kappen Linear theory for control of nonlinear stochastic systems. , 2004, Physical review letters.

[8]  Anna Ja 'skiewicz Average optimality for risk-sensitive control with general state space , 2007 .

[9]  Emanuel Todorov,et al.  Efficient computation of optimal actions , 2009, Proceedings of the National Academy of Sciences.

[10]  Frédo Durand,et al.  Linear Bellman combination for control of character animation , 2009, ACM Trans. Graph..

[11]  Stefan Schaal,et al.  Reinforcement learning of motor skills in high dimensions: A path integral approach , 2010, 2010 IEEE International Conference on Robotics and Automation.

[12]  Peter Harremoës,et al.  Rényi divergence and majorization , 2010, 2010 IEEE International Symposium on Information Theory.

[13]  Hilbert J. Kappen,et al.  Risk Sensitive Path Integral Control , 2010, UAI.

[14]  Hilbert J. Kappen,et al.  EP for Efficient Stochastic Control with Obstacles , 2010, ECAI.