On Controlled Finite State Markov Processes with Compact Control Sets

1. In this paper we consider the maximization of the average gain per unit step in controlled finite state Markov chains with compact control sets. In [1] and [2] a stationary optimal strategy was shown to exist under the assumption that the control sets are finite. In I-3] it was proved that if the control sets are compact and coincide with the transition probability sets, the gain functions are continuous and any stationary strategy yields a Markov chain with one ergodic class and without transient states, then there exists a stationary optimal strategy. In the general case, conditions of compactness of the control sets and of continuity of the gain and transition functions are not sufficient for the existence of an optimal strategy (see [4], Example 3). In this paper the existence is established of a stationary optimal strategy under the condition that the control sets are compact, the gain functions are upper semi-continuous, the transition functions depend continuously on the controls and that one of the following conditions holds: (i) any stationary strategy yields a Markov chain with one ergodic class, and possibly with transient states (Section 3); (ii) for each state the set of transition probabilities contains a finite set of extreme points (Section 4). 2. Let X be a state space consisting of a finite number of points (X {1, 2, , s}). For each state there is given a control set Ax(x 1, 2,..., s). On the sets Ax there are defined functions qx(a) (the gain from the control a Ax when the process is in the state x), and probability measures Px (" [a) on X (the transition functions under the condition that the process is in the state x and the control a Ax is chosen). Set A U= A. collection of these measures for 1, 2, defines the strategy 7r. The strategy 7r is called stationary if the measures zr, are concentrated at the points at p(x,_), where p is a selector, i.e., a mapping of X into A such that p (x) A. The corresponding strategy is also denoted qg. Clearly, each selector defines a homogeneous Markov chain. Let AA denote the collection of all strategies defined on the control sets Ax. The gain yielded by the strategy zr is estimated by the function 1 N