Stationary Policies in Dynamic Programming Models Under Compactness Assumptions

The present work deals with the usual stationary decision model of dynamic programming. The imposed convergence condition on the expected total rewards is so general that both the negative (unbounded) case and the positive (unbounded) case are included. However, the gambling model studied by Dubins and Savage is not covered by the present model. In addition to the convergence condition, a continuity and compactness condition is imposed. The main result states that the supremum of the expected total rewards under all stationary policies is equal to the supremum under all (possibly randomized and non-Markovian) policies.