Bounds and good policies in stationary finite–stage Markovian decision problems

A stationary Markovian decision model is considered with general state and action spaces where the transition probabilities are weakened to be bounded transition measures (this is useful for many applications). New and improved bounds are given for the optimal value of stationary problems with a large planning horizon if either only a few steps of iteration are carried out or, in addition, a solution of the infinite-stage problem is known. Similar estimates are obtained for the quality of policies which are composed of nearly optimal decisions from the first few steps or from the infinite-stage solution.

[1]  S. Ross Arbitrary State Markovian Decision Processes , 1968 .

[2]  K. Hinderer ON APPROXIMATE SOLUTIONS OF FINITE-STAGE DYNAMIC PROGRAMS , 1978 .

[3]  M. Schäl A selection theorem for optimization problems , 1974 .

[4]  Paul J. Schweitzer Multiple Policy Improvements in Undiscounted Markov Renewal Programming , 1971, Oper. Res..

[5]  Evan L. Porteus Some Bounds for Discounted Sequential Decision Processes , 1971 .

[6]  J. A. E. E. van Nunen Contracting Markov decision processes , 1976 .

[7]  David Siegmund,et al.  Great expectations: The theory of optimal stopping , 1971 .

[8]  J. MacQueen A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[9]  H. H. Schaefer Banach Lattices and Positive Operators , 1975 .

[10]  Paul J. Schweitzer,et al.  The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems , 1977, Math. Oper. Res..

[11]  Helmut Schellhaas,et al.  Zur Extrapolation in Markoffschen Entscheidungsmodellen mit Diskontierung , 1974, Z. Oper. Research.

[12]  G. Hübner Improved Procedures for Eliminating Suboptimal Actions in Markov Programming by the Use of Contraction Properties , 1977 .

[13]  A. F. Veinott Discrete Dynamic Programming with Sensitive Discount Optimality Criteria , 1969 .

[14]  Karl Hinderer Instationäre dynamische Optimierung bei schwachen Voraussetzungen über die Gewinnfunktionen , 1971 .

[15]  J.A.E.E. van Nunen,et al.  The action elimination algorithm for Markov decision processes , 1976 .

[16]  K. Hinderer,et al.  Foundations of Non-stationary Dynamic Programming with Discrete Time Parameter , 1970 .

[17]  T. Parthasarathy,et al.  Optimal Plans for Dynamic Programming Problems , 1976, Math. Oper. Res..

[18]  U. Rieder Measurable selection theorems for optimization problems , 1978 .

[19]  K. Hinderer Estimates for finite-stage dynamic programs , 1976 .

[20]  N. A. J. Hastings Technical Note - Bounds on the Gain of a Markov Decision Process , 1971, Oper. Res..

[21]  Harold J. Kushner,et al.  Accelerated procedures for the solution of discrete Markov control problems , 1971 .

[22]  K. Hinderer,et al.  An Improvement of J. F. Shapiro’s Turnpike Theorem for the Horizon of Finite Stage Discrete Dynamic Programs , 1977 .

[23]  N. Hastings,et al.  Note---A Test for Nonoptimal Actions in Undiscounted Finite Markov Decision Chains , 1976 .

[24]  J. Wessels Markov programming by successive approximations by respect to weighted supremum norms , 1976, Advances in Applied Probability.

[25]  J. Shapiro Turnpike Planning Horizons for a Markovian Decision Model , 1968 .

[26]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[27]  S. Ross NON-DISCOUNTED DENUMERABLE MARKOVIAN DECISION MODELS , 1968 .

[28]  J Jaap Wessels,et al.  Stopping Times and Markov Programming , 1977 .

[29]  U. Rieder Bayesian dynamic programming , 1975, Advances in Applied Probability.

[30]  Evan L. Porteus Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[31]  J. MacQueen,et al.  Letter to the Editor - A Test for Suboptimal Actions in Markovian Decision Problems , 1967, Oper. Res..

[32]  S. Lippman Semi-Markov Decision Processes with Unbounded Rewards , 1973 .

[33]  N. A. J. Hastings,et al.  Some Notes on Dynamic Programming and Replacement , 1968 .

[34]  T. Morton,et al.  Discounting, Ergodicity and Convergence for Markov Decision Processes , 1977 .

[35]  S. Pliska Optimization of Multitype Branching Processes , 1976 .

[36]  Evan L. Porteus,et al.  Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain , 1978, Oper. Res..