Optimal trade-off between exploration and exploitation

Control in an uncertain environment often involves a trade-off between exploratory actions, whose goal is to gather sensory information, and "regular" actions which exploit the information gathered so far and pursue the task objectives. In principle both types of action can be modeled by minimizing a single cost function within the framework of stochastic optimal control. In practice however this is difficult, because the control law must be sensitive to estimation uncertainty which violates the certainty-equivalence principle. In this paper we formalize the problem in a way which captures the essence of the exploration-exploitation trade-off and yet is amenable to numerical methods for optimal control. The key to our approach is augmenting the dynamics of the partially-observable plant with the Kalman filter dynamics, thus obtaining a higher-dimensional but fully-observable plant. The resulting control laws compare favorably to other more ad-hoc approaches. Our formalism is also suitable for modeling human behavior in tasks which benefit from active exploration.

[1]  L. Collatz The numerical treatment of differential equations , 1961 .

[2]  A. A. Feldbaum,et al.  DUAL CONTROL THEORY, IV , 1961 .

[3]  Y. Bar-Shalom Stochastic dynamic programming: Caution and probing , 1981 .

[4]  J. Ferziger Numerical methods for engineering application , 1981 .

[5]  Anuradha M. Annaswamy,et al.  Robust Adaptive Control , 1984, 1984 American Control Conference.

[6]  H. W. Sorenson,et al.  Kalman filtering : theory and application , 1985 .

[7]  R. Stengel Stochastic Optimal Control: Theory and Application , 1986 .

[8]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[9]  Björn Wittenmark,et al.  Adaptive Dual Control Methods: An Overview , 1995 .

[10]  Petros G. Voulgaris,et al.  On optimal ℓ∞ to ℓ∞ filtering , 1995, Autom..

[11]  Paul M. J. Van den Hof,et al.  Identification and control - Closed-loop issues , 1995, Autom..

[12]  D. Mayne Nonlinear and Adaptive Control Design [Book Review] , 1996, IEEE Transactions on Automatic Control.

[13]  H. Kushner Numerical Methods for Stochastic Control Problems in Continuous Time , 2000 .

[14]  Tamer Basar,et al.  Dual Control Theory , 2001 .

[15]  Raymond A. de Callafon,et al.  Multivariable feedback relevant system identification of a wafer stepper system , 2001, IEEE Trans. Control. Syst. Technol..

[16]  E. Todorov Optimality principles in sensorimotor control , 2004, Nature Neuroscience.

[17]  Warren B. Powell,et al.  Handbook of Learning and Approximate Dynamic Programming , 2006, IEEE Transactions on Automatic Control.

[18]  Thomas Parisini,et al.  Active State Estimation for Nonlinear Systems: A Neural Approximation Approach , 2007, IEEE Transactions on Neural Networks.

[19]  Dimitri P. Bertsekas,et al.  Stochastic optimal control : the discrete time case , 2007 .