We consider a special case of reinforcement learning where the environment can be described by a linear system. The states of the environment and the actions the agent can perform are represented by real vectors and the system dynamic is given by a linear equation with a stochastic component. The problem is equivalent to the so-called linear quadratic regulator problem studied in the optimal and adaptive control literature. We propose a learning algorithm for that problem and analyze it in a PAC learning framework. Unlike the algorithms in the adaptive control literature, our algorithm actively explores the environment to learn an accurate model of the system faster. We show that the control law produced by our algorithm has, with high probability, a value that is close to that of an optimal policy relative to the magnitude of the initial state of the system. The time taken by the algorithm is polynomial in the dimension n of the state-space and in the dimension T of the action-space when the ratio n/r is a constant.
[1]
C.C. White,et al.
Dynamic programming and stochastic control
,
1978,
Proceedings of the IEEE.
[2]
Leslie G. Valiant,et al.
A theory of the learnable
,
1984,
STOC '84.
[3]
Graham C. Goodwin,et al.
Adaptive filtering prediction and control
,
1984
.
[4]
S. Bittanti,et al.
The Riccati equation
,
1991
.
[5]
Arunabha Bagchi.
Optimal Control of Stochastic Systems
,
1993
.
[6]
C. Fiechter.
Eecient Reinforcement Learning
,
1994
.
[7]
E. Mosca.
Optimal, Predictive and Adaptive Control
,
1994
.
[8]
Claude-Nicolas Fiechter,et al.
Efficient reinforcement learning
,
1994,
COLT '94.
[9]
Ben J. A. Kröse,et al.
Learning from delayed rewards
,
1995,
Robotics Auton. Syst..
[10]
Andrew W. Moore,et al.
Reinforcement Learning: A Survey
,
1996,
J. Artif. Intell. Res..
[11]
Claude-Nicolas Fiechter.
Expected Mistake Bound Model for On-Line Reinforcement Learning
,
1997,
ICML.