Markovian Decision Processes with Uncertain Transition Probabilities

This paper examines Markovian decision processes in which the transition probabilities corresponding to alternative decisions are not known with certainty. The processes are assumed to be finite-state, discrete-time, and stationary. The rewards axe time discounted. Both a game-theoretic and the Bayesian formulation are considered. In the game-theoretic formulation, variants of a policy-iteration algorithm are provided for both the max-min and the max-max cases. An implicit enumeration algorithm is discussed for the Bayesian formulation where upper and lower bounds on the total expected discounted return are provided by the max-max and max-min optimal policies. Finally, the paper discusses asymptotically Bayes-optimal policies.