Exploring, Modelling, and Controlling Discrete Sequential Environments

Imagine being given a new system to control, whose structure is unknown and can only be ascertained by input-output experiments. The process of interacting with it can be broken down into three phases which, although they are distinct conceptually, usually overlap in practice. The breakdown seems to correspond with what we do intuitively when presented with a strange device that we cannot take apart—like, for example, a terminal attached to an unfamiliar computer system. Firstly, input sequences must be synthesized which force the system to exhibit interesting behaviour—the exploration problem. Secondly, the input-output behaviour of the system must be modelled. Finally, one must learn how to control the system by generating sequences of inputs which drive it into desirable states. The purpose of each of the three phases is to facilitate control of the system (or, as we shall say, environment)—prudent exploration accelerates modelling, successful modelling assists control. This paper discusses these three components of the learning control problem, and summarizes results and techniques that bear on each of them.