Existence of a Stationary Control for a Markov Chain Maximizing the Average Reward

The problem of optimal control of a discrete time stationary Markov chain with complete state information has been considered by many authors. The case with finitely many states and controls has been thoroughly investigated. Chains with infinitely many states or controls have also been considered with various assumptions concerning the reward function. In this paper the existence of a control maximizing the average reward is established for Markov chains with a finite number of states and an arbitrary compact set of possible actions in each state. It is assumed that there is only one ergodic class and no transient states in the chain for every control. The method of proof uses methods from convex programming, and is analogous to the linear programming approach used by Wolfe and Danzig.