Average Cost Optimal Stationary Policies in Infinite State Markov Decision Processes with Unbounded Costs

We deal with infinite state Markov decision processes with unbounded costs. Three simple conditions, based on the optimal discounted value function, guarantee the existence of an expected average cost optimal stationary policy. Sufficient conditions are the existence of a distinguished state of smallest discounted value and a single stationary policy inducing an irreducible, ergodic Markov chain for which the average cost of a first passage from any state to the distinguished state is finite. A result to verify this is also given. Two examples illustrate the ease of applying the criteria.

[1]  C. Derman,et al.  A SOLUTION TO A COUNTABLE SYSTEM OF EQUATIONS ARISING IN MARKOVIAN DECISION PROCESSES. , 1966 .

[2]  C. Derman DENUMERABLE STATE MARKOVIAN DECISION PROCESSES: AVERAGE COST CRITERION. , 1966 .

[3]  S. Ross,et al.  An Example in Denumerable Decision Processes , 1968 .

[4]  A. G. Pakes,et al.  Some Conditions for Ergodicity and Recurrence of Markov Chains , 1969, Oper. Res..

[5]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[6]  S. Lippman On Dynamic Programming with Unbounded Rewards , 1975 .

[7]  R. Ash,et al.  Real analysis and probability , 1975 .

[8]  A. Hordijk Regenerative Markov decision models , 1976 .

[9]  A. Federgruen,et al.  Denumerable state semi-markov decision processes with unbounded costs, average cost criterion : (preprint) , 1979 .

[10]  R. Serfozo Optimal control of random walks, birth and death processes, and queues , 1981, Advances in Applied Probability.

[11]  Paul J. Schweitzer,et al.  Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards , 1983, Math. Oper. Res..

[12]  R. Tweedie The existence of moments for stationary Markov chains , 1983, Journal of Applied Probability.

[13]  H. Zijm THE OPTIMALITY EQUATIONS IN MULTICHAIN DENUMERABLE STATE MARKOV DECISION PROCESSES WITH THE AVERAGE COST CRITERION: THE BOUNDED COST CASE MULTISTAGE BAYESIAN ACCEPTANCE SAMPLING: OPTIMALITY OF A (z,c",c'^)-SAMPLING PLAN IN GASE OF A POLYA PRIOR DISTRIBUTION , 1985 .

[14]  L. Sennott A new condition for the existence of optimum stationary policies in average cost Markov decision processes - Unbounded cost case , 1986, 1986 25th IEEE Conference on Decision and Control.

[15]  L. Sennott A new condition for the existence of optimal stationary policies in average cost Markov decision processes , 1986 .

[16]  R. Weber,et al.  Optimal control of service rates in networks of queues , 1987, Advances in Applied Probability.

[17]  Arie Hordijk,et al.  Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards , 1988, Math. Oper. Res..

[18]  Rolando Cavazos-Cadena,et al.  Weak conditions for the existence of optimal stationary policies in average Markov decision chains with unbounded costs , 1989, Kybernetika.