论文信息 - Overtaking Optimality for Markov Decision Chains

Overtaking Optimality for Markov Decision Chains

In a finite Markov decision chain, let V(n, Π)i denote the expectation of the income earned for starting at state i and following (stationary or nonstationary) policy Π for n epochs. Call policy Π overtaking optimal if lim infn→∞ {V(n, Π)i − V(n, Λ)i} ≥ 0 for every policy A and every state i. This paper provides conditions under which a certain stationary policy is overtaking optimal. Tests for these conditions are provided. These conditions hold, for instance, when Blackwell's (Blackwell, D. 1962. Discrete dynamic programming. Ann. Math. Statist. 33 719–726.) policy iteration routine terminates unambiguously, without ties, with a single policy whose transition matrix is not cyclic.

Uriel G. Rothblum | Eric V. Denardo | E. Denardo | U. Rothblum

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] D. Blackwell. Discrete Dynamic Programming , 1962 .

[3] W. Barry. On the Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process , 1965 .

[4] A. F. Veinott. ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[5] D. Gale. On Optimal Development in a Multi-Sector Economy , 1967 .

[6] B. L. Miller,et al. An Optimality Condition for Discrete Dynamic Programming with no Discounting , 1968 .

[7] Bennett L. Fox,et al. Scientific Applications: An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix , 1967, Commun. ACM.

[8] S. Lippman. On the set of optimal policies in discrete dynamic programming , 1968 .

[9] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[10] Eric V. Denardo,et al. Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem , 1970, Oper. Res..

[11] Arthur F. Veinott,et al. Computing a graph's period quadratically by node condensation , 1973, Discret. Math..

[12] J. Bather. Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[13] Karel Sladký,et al. On the set of optimal controls for Markov chains with rewards , 1974, Kybernetika.

[14] Eric V. Denardo,et al. Periods of Connected Networks and Powers of Nonnegative Matrices , 1977, Math. Oper. Res..

[15] U. Rothblum. Normalized Markov Decision Chains. II: Optimality of Nonstationary Policies , 1977 .