论文信息 - Bounds and good policies in stationary finite–stage Markovian decision problems - 字舞流文

Bounds and good policies in stationary finite–stage Markovian decision problems

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.

G. Hübner | Gerhard Hübner

[1] S. Ross. Arbitrary State Markovian Decision Processes , 1968 .

[2] K. Hinderer. ON APPROXIMATE SOLUTIONS OF FINITE-STAGE DYNAMIC PROGRAMS , 1978 .

[3] M. Schäl. A selection theorem for optimization problems , 1974 .

[4] Paul J. Schweitzer. Multiple Policy Improvements in Undiscounted Markov Renewal Programming , 1971, Oper. Res..

[5] Evan L. Porteus. Some Bounds for Discounted Sequential Decision Processes , 1971 .

[6] J. A. E. E. van Nunen. Contracting Markov decision processes , 1976 .

[7] David Siegmund,et al. Great expectations: The theory of optimal stopping , 1971 .

[8] J. MacQueen. A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[9] H. H. Schaefer. Banach Lattices and Positive Operators , 1975 .

[10] Paul J. Schweitzer,et al. The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems , 1977, Math. Oper. Res..

[11] Helmut Schellhaas,et al. Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung , 1974, Z. Oper. Research.

[12] G. Hübner. Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties , 1977 .

[13] A. F. Veinott. Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[14] Karl Hinderer. Instationäre dynamische Optimierung bei schwachen Voraussetzungen über die Gewinnfunktionen , 1971 .

[15] J.A.E.E. van Nunen,et al. The action elimination algorithm for Markov decision processes , 1976 .

[16] K. Hinderer,et al. Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[17] T. Parthasarathy,et al. Optimal Plans for Dynamic Programming Problems , 1976, Math. Oper. Res..

[18] U. Rieder. Measurable selection theorems for optimization problems , 1978 .

[19] K. Hinderer. Estimates for finite-stage dynamic programs , 1976 .

[20] N. A. J. Hastings. Technical Note - Bounds on the Gain of a Markov Decision Process , 1971, Oper. Res..

[21] Harold J. Kushner,et al. Accelerated procedures for the solution of discrete Markov control problems , 1971 .

[22] K. Hinderer,et al. An Improvement of J. F. Shapiro’s Turnpike Theorem for the Horizon of Finite Stage Discrete Dynamic Programs , 1977 .

[23] N. Hastings,et al. Note---A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains , 1976 .

[24] J. Wessels. Markov programming by successive approximations by respect to weighted supremum norms , 1976, Advances in Applied Probability.

[25] J. Shapiro. Turnpike Planning Horizons for a Markovian Decision Model , 1968 .

[26] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[27] S. Ross. NON-DISCOUNTED DENUMERABLE MARKOVIAN DECISION MODELS , 1968 .

[28] J Jaap Wessels,et al. Stopping Times and Markov Programming , 1977 .

[29] U. Rieder. Bayesian dynamic programming , 1975, Advances in Applied Probability.

[30] Evan L. Porteus. Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[31] J. MacQueen,et al. Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[32] S. Lippman. Semi-Markov Decision Processes with Unbounded Rewards , 1973 .

[33] N. A. J. Hastings,et al. Some Notes on Dynamic Programming and Replacement , 1968 .

[34] T. Morton,et al. Discounting, Ergodicity and Convergence for Markov Decision Processes , 1977 .

[35] S. Pliska. Optimization of Multitype Branching Processes , 1976 .

[36] Evan L. Porteus,et al. Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain , 1978, Oper. Res..