Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards

Abstract. This paper is the second part of our study of Blackwell optimal policies in Markov decision chains with a Borel state space and unbounded rewards. We prove that a stationary policy is Blackwell optimal in the class of all history-dependent policies if it is Blackwell optimal in the class of stationary policies. We also develop recurrence and drift conditions which ensure ergodicity and integrability assumptions made in the previous paper, and which are more suitable for applications. As an example we study a cash-balance model.

[1]  David Blackwell,et al.  Positive dynamic programming , 1967 .

[2]  D. Blackwell Discrete Dynamic Programming , 1962 .

[3]  Arie Hordijk,et al.  Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..

[4]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[5]  R. Strauch Negative Dynamic Programming , 1966 .

[6]  T. Parthasarathy,et al.  Optimal Plans for Dynamic Programming Problems , 1976, Math. Oper. Res..

[7]  J. Warga,et al.  Functions of Relaxed Controls , 1967 .

[8]  M. Schal On Dynamic Programming and Statistical Decision Theory , 1979 .

[9]  E. Balder On compactness of the space of policies in stochastic dynamic programming , 1989 .

[10]  A. A. Yushkevich,et al.  Blackwell optimal policies in a Markov decision process with a Borel state space , 1994, Math. Methods Oper. Res..

[11]  Rommert Dekker,et al.  Recurrence Conditions for Average and Blackwell Optimality in Denumerable State Markov Decision Chains , 1992, Math. Oper. Res..

[12]  Rommert Dekker,et al.  On the Relation Between Recurrence and Ergodicity Properties in Denumerable Markov Decision Chains , 1994, Math. Oper. Res..

[13]  M. Schäl On dynamic programming: Compactness of the space of policies , 1975 .

[14]  O. Hernández-Lerma,et al.  Recurrence conditions for Markov decision processes with Borel state space: A survey , 1991 .

[15]  Alexander A. Yushkevich,et al.  The Compactness of a Policy Space in Dynamic Programming Via an Extension Theorem for Carathéodory Functions , 1997, Math. Oper. Res..

[16]  Karel Sladký,et al.  Sensitive Optimality Criteria in Countable State Dynamic Programming , 1977, Math. Oper. Res..

[17]  Kai Lai Chung,et al.  PROBABILITY AND POTENTIAL , 1995 .

[18]  Karel Sladký,et al.  On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[19]  O. Hernández-Lerma,et al.  Discrete-time Markov control processes , 1999 .

[20]  Alexander A. Yushkevich,et al.  Blackwell Optimality in Borelian Continuous-in-Action Markov Decision Processes , 1997 .

[21]  Onésimo Hernández-Lerma,et al.  Controlled Markov Processes , 1965 .

[22]  H. Kushner Introduction to stochastic control , 1971 .

[23]  Richard L. Tweedie,et al.  Markov Chains and Stochastic Stability , 1993, Communications and Control Engineering Series.

[24]  A. Hordijk,et al.  On ergodicity and recurrence properties of a Markov chain by an application to an open jackson network , 1992, Advances in Applied Probability.