Recurrence Conditions for Average and Blackwell Optimality in Denumerable State Markov Decision Chains

In a previous paper Dekker and Hordijk 1988 presented an operator theoretical approach for multichain Markov decision processes with a countable state space, compact action sets and unbounded rewards. Conditions were presented guaranteeing the existence of a Laurent series expansion for the discounted rewards, the existence of average and Blackwell optimal policies and the existence of solutions for the average and Blackwell optimality equations. While these assumptions were operator oriented and formulated as conditions for the deviation matrix, we will show in this paper that the same approach can also be carried out under recurrence conditions. These new conditions seem easier to check in general and are especially suited for applications in queueing models.

[1]  Adrian C. Lavercombe,et al.  Recent Developments in Markov Decision Processes , 1982 .

[2]  Flos Spieksma,et al.  The existence of sensitive optimal policies in two multi-dimensional queueing models , 1991 .

[3]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[4]  J. B. Lasserre,et al.  Conditions for Existence of Average and Blackwell Optimal Stationary Policies in Denumerable Markov Decision Processes , 1988 .

[5]  A. Federgruen,et al.  A note on simultaneous recurrence conditions on a set of denumerable stochastic matrices : (preprint) , 1978 .

[6]  Arie Hordijk,et al.  Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..

[7]  E. C. Titchmarsh,et al.  The theory of functions , 1933 .

[8]  Arie Hordijk,et al.  Transient policies in discrete dynamic programming: Linear programming including suboptimality tests and additional constraints , 1984, Math. Program..

[9]  A. Federgruen,et al.  Denumerable state semi-markov decision processes with unbounded costs, average cost criterion : (preprint) , 1979 .

[10]  Rommert Dekker,et al.  Denumerable Markov Decision Chains: Sensitive Optimality Criteria , 1991 .

[11]  J Jaap Wessels,et al.  Markov decision processes with unbounded rewards , 1977 .

[12]  J. Wessels,et al.  Markov decision processes and strongly excessive functions , 1978 .

[13]  Paul J. Schweitzer,et al.  Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards , 1983, Math. Oper. Res..

[14]  J Jaap Wessels,et al.  Markov Decision Theory , 1979 .

[15]  Kai Lai Chung,et al.  Markov Chains with Stationary Transition Probabilities , 1961 .

[16]  Schäl Manfred Estimation and control in discounted stochastic dynamic programming , 1987 .

[17]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[18]  Karel Sladký,et al.  Sensitive Optimality Criteria in Countable State Dynamic Programming , 1977, Math. Oper. Res..

[19]  Paul J. Schweitzer,et al.  Solving MDP functional equations by lexicographic optimization , 1982 .

[20]  A. Hordijk,et al.  On ergodicity and recurrence properties of a Markov chain by an application to an open jackson network , 1992, Advances in Applied Probability.

[21]  Elke Mann,et al.  Optimality equations and sensitive optimality in bounded Markov decision processes 1 , 1985 .

[22]  H. Zijm THE OPTIMALITY EQUATIONS IN MULTICHAIN DENUMERABLE STATE MARKOV DECISION PROCESSES WITH THE AVERAGE COST CRITERION: THE BOUNDED COST CASE MULTISTAGE BAYESIAN ACCEPTANCE SAMPLING: OPTIMALITY OF A (z,c",c'^)-SAMPLING PLAN IN GASE OF A POLYA PRIOR DISTRIBUTION , 1985 .

[23]  Rommert Dekker,et al.  Denumerable semi-Markov decision chains with small interest rates , 1991 .