PAC adaptive control of linear systems

We consider a special case of reinforcement learning where the environment can be described by a linear system. The states of the environment and the actions the agent can perform are represented by real vectors and the system dynamic is given by a linear equation with a stochastic component. The problem is equivalent to the so-called linear quadratic regulator problem studied in the optimal and adaptive control literature. We propose a learning algorithm for that problem and analyze it in a PAC learning framework. Unlike the algorithms in the adaptive control literature, our algorithm actively explores the environment to learn an accurate model of the system faster. We show that the control law produced by our algorithm has, with high probability, a value that is close to that of an optimal policy relative to the magnitude of the initial state of the system. The time taken by the algorithm is polynomial in the dimension n of the state-space and in the dimension T of the action-space when the ratio n/r is a constant.