Introduction and First Examples

Suppose a system is given which can be controlled by sequential decisions. The state transitions are random and we assume that the system state process is Markovian which means that previous states have no influence on future states. Given the current state of the system (which could be for example the wealth of an investor) the controller or decision maker has to choose an admissible action (for example a possible investment). Once an action is chosen there is a random system transition according to a stochastic law (for example a change in the asset value) which leads to a new state. The task is to control the process in an optimal way. In order to formulate a reasonable optimization criterion we assume that each time an action is taken, the controller obtains a certain reward. The aim is then to control the system in such a way that the expected total discounted rewards are maximized. All these quantities together which have been described in an informal way, define a so-called Markov Decision Process. The Markov Decision Process is the sequence of random variables (Xn) which describes the stochastic evolution of the system states. Of course the distribution of (Xn) depends on the chosen actions. Figure 1.1 shows the schematic evolution of a Markov Decision Process.