Overtaking Optimality for Markov Decision Chains

In a finite Markov decision chain, let V(n, Π)i denote the expectation of the income earned for starting at state i and following (stationary or nonstationary) policy Π for n epochs. Call policy Π overtaking optimal if lim infn→∞ {V(n, Π)i − V(n, Λ)i} ≥ 0 for every policy A and every state i. This paper provides conditions under which a certain stationary policy is overtaking optimal. Tests for these conditions are provided. These conditions hold, for instance, when Blackwell's (Blackwell, D. 1962. Discrete dynamic programming. Ann. Math. Statist. 33 719–726.) policy iteration routine terminates unambiguously, without ties, with a single policy whose transition matrix is not cyclic.

[1]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[2]  D. Blackwell Discrete Dynamic Programming , 1962 .

[3]  W. Barry On the Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process , 1965 .

[4]  A. F. Veinott ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[5]  D. Gale On Optimal Development in a Multi-Sector Economy , 1967 .

[6]  B. L. Miller,et al.  An Optimality Condition for Discrete Dynamic Programming with no Discounting , 1968 .

[7]  Bennett L. Fox,et al.  Scientific Applications: An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix , 1967, Commun. ACM.

[8]  S. Lippman On the set of optimal policies in discrete dynamic programming , 1968 .

[9]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[10]  Eric V. Denardo,et al.  Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..

[11]  Arthur F. Veinott,et al.  Computing a graph's period quadratically by node condensation , 1973, Discret. Math..

[12]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[13]  Karel Sladký,et al.  On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[14]  Eric V. Denardo,et al.  Periods of Connected Networks and Powers of Nonnegative Matrices , 1977, Math. Oper. Res..

[15]  U. Rothblum Normalized Markov Decision Chains. II: Optimality of Nonstationary Policies , 1977 .