Uniqueness and Stability of Optimal Policies of Finite State Markov Decision Processes

In this paper we consider infinite horizon discrete-time optimal control of Markov decision processes (MDPs) with finite state spaces and compact action sets. We restrict attention to unicost MDPs, which form a class that contains all the weakly communicating MDPs. The unicost MDPs are characterized as those MDPs for which there exists a solution to the single optimality equation. We address the problem of uniqueness and stability of minimizing Markov actions. Our main result asserts that when we endow the set of unicost MDPs with a certain natural metric, under which it is complete, then the class of MDPs with essentially unique and stable minimizing Markov actions contains the intersection of countably many open dense sets (hence is itself dense). Thus, the property of having essentially unique and stable minimizing Markov actions is generic for unicost MDPs.

[1]  J. Bather Optimal decision procedures for finite Markov chains. Part II: Communicating systems , 1973, Advances in Applied Probability.

[2]  P. Schweitzer A Brouwer fixed-point mapping approach to communicating Markov decision processes , 1987 .

[3]  Arie Hordijk,et al.  Dynamic programming and Markov potential theory , 1974 .

[4]  J. Bather Optimal decision procedures for finite Markov chains. Part III: General convex systems , 1973 .

[5]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[6]  Arie Leizarowitz,et al.  An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes , 2003, Math. Oper. Res..

[7]  J. Aubin,et al.  Applied Nonlinear Analysis , 1984 .

[8]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[9]  P. Schweitzer,et al.  A Fixed Point Approach to Undiscounted Markov Renewal Programs , 1984 .

[10]  E. A. Fainberg On Controlled Finite State Markov Processes with Compact Control Sets , 1976 .

[11]  P. Schweitzer On the solvability of Bellman's functional equations for Markov renewal programming , 1983 .

[12]  V. Borkar Control of Markov chains with long-run average cost criterion: the dynamic programming equations , 1989 .

[13]  Arie Leizarowitz,et al.  Overtaking and Almost-Sure Optimality for Infinite Horizon Markov Decision Processes , 1996, Math. Oper. Res..

[14]  V. Borkar On Minimum Cost Per Unit Time Control of Markov Chains , 1984 .

[15]  Martin L. Puterman,et al.  On the Convergence of Policy Iteration in Finite State Undiscounted Markov Decision Processes: The Unichain Case , 1987, Math. Oper. Res..

[16]  Paul J. Schweitzer,et al.  Denumerable Undiscounted Semi-Markov Decision Processes with Unbounded Rewards , 1983, Math. Oper. Res..