Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties

In applying the method of value iteration one frequently observes that the relative values for the n-th stage converge very rapidly with increasing n, whereas the absolute values converge slowly (discounting factor β near one) or even diverge (β ≧ 1). This fact is used by MacQueen [10], [11] and others to give good bounds for the value of the infinite horizon problem and, in addition, for the elimination of suboptimal actions in the early stages. This elimination can be improved by the use of an upper bound S for the convergence rate. In case of δ < β the improvement has two effects: it reduces computing time and it allows the application to the finite horizon case with some β ≧ 1.

[1]  J. L. Mott XXIV.—Conditions for the Ergodicity of Non-homogeneous Finite Markov Chains , 1957, Proceedings of the Royal Society of Edinburgh. Section A. Mathematical and Physical Sciences.

[2]  M. Bartlett,et al.  Weak ergodicity in non-homogeneous Markov chains , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[3]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[4]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[5]  D. White,et al.  Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[6]  William S. Jewell,et al.  Markov-Renewal Programming. I: Formulation, Finite Return Models , 1963 .

[7]  W. Jewell Markov-Renewal Programming. II: Infinite Return Models, Example , 1963 .

[8]  J. MacQueen A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[9]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[10]  J. Shapiro Turnpike Planning Horizons for a Markovian Decision Model , 1968 .

[11]  Evan L. Porteus Some Bounds for Discounted Sequential Decision Processes , 1971 .

[12]  Thomas E. Morton Technical Note - On the Asymptotic Convergence Rate of Cost Differences for Markovian Decision Processes , 1971, Oper. Res..

[13]  Y. Dirickx Deterministic Discrete Dynamic Programming with Discount Factor Greater than One: Structure of Optimal Policies , 1973 .

[14]  N. Hastings,et al.  Tests for Suboptimal Actions in Discounted Markov Programming , 1973 .

[15]  Richard C. Grinold,et al.  Technical Note - Elimination of Suboptimal Actions in Markov Decision Problems , 1973, Oper. Res..

[16]  Helmut Schellhaas,et al.  Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung , 1974, Z. Oper. Research.

[17]  Eugene Seneta,et al.  Non‐Negative Matrices , 1975 .

[18]  K. Hinderer Estimates for finite-stage dynamic programs , 1976 .

[19]  Dieter Reetz,et al.  A decision exclusion algorithm for a class of Markovian Decision Processes , 1976, Math. Methods Oper. Res..

[20]  John G. Kemeny,et al.  Finite Markov chains , 1960 .

[21]  K. Hinderer,et al.  An Improvement of J. F. Shapiro’s Turnpike Theorem for the Horizon of Finite Stage Discrete Dynamic Programs , 1977 .