论文信息 - Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

Blackwell optimality in the class of stationary policies in Markov decision chains with a Borel state space and unbounded rewards

Abstract. This paper is the first part of a study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove here the existence of deterministic stationary policies which are Blackwell optimal in the class of all, in general randomized, stationary policies. We establish also a lexicographical policy improvement algorithm leading to Blackwell optimal policies and the relation between such policies and the Blackwell optimality equation. Our technique is a combination of the weighted norms approach developed in Dekker and Hordijk (1988) for countable models with unbounded rewards and of the weak-strong topology approach used in Yushkevich (1997a) for Borel models with bounded rewards.

Arie Hordijk | Alexander Yushkevich