Computing the discounted return in markov and semi‐markov chains

This paper addresses the problem of computing the expected discounted return in finite Markov and semi-Markov chains. The objective is to reveal insights into two questions. First, which iterative methods hold the most promise? Second, when are interative methods preferred to Gaussian elimination? A set of twenty-seven randomly generated problems is used to compare the performance of the methods considered. The observations that apply to the problems generated here are as follows: Gauss-Seidel is not preferred to Pre-Jacobi in general. However, if the matrix is reordered in a certain way and the author's row sum extrapolation is used, then Gauss-Seidel is preferred. Transforming a semi-Markov problem into a Markov one using a transformation that comes from Schweitzer does not yield improved performance. A method analogous to symmetric successive overrelaxation (SSOR) in numerical analysis yields improved performance, especially when the row-sum extrapolation is used only sparingly. This method is then compared to Gaussian elimination and is found to be superior for most of the problems generated.

[1]  M. S. Bartlett,et al.  The ergodic properties of non-homogeneous finite Markov chains , 1956, Mathematical Proceedings of the Cambridge Philosophical Society.

[2]  H. Markowitz The Elimination form of the Inverse and its Application to Linear Programming , 1957 .

[3]  M. Bartlett,et al.  Weak ergodicity in non-homogeneous Markov chains , 1958, Mathematical Proceedings of the Cambridge Philosophical Society.

[4]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[5]  K. Fan NOTE ON M -MATRICES , 1960 .

[6]  D. White,et al.  Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[7]  William S. Jewell,et al.  Markov-Renewal Programming. I: Formulation, Finite Return Models , 1963 .

[8]  J. MacQueen A MODIFIED DYNAMIC PROGRAMMING METHOD FOR MARKOVIAN DECISION PROBLEMS , 1966 .

[9]  E. Denardo CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[10]  A. F. Veinott Extreme points of leontief substitution systems , 1968 .

[11]  J. MacQueen,et al.  On computing the expected discounted return in a markov chain , 1970 .

[12]  P. Schweitzer Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .

[13]  Evan L. Porteus Some Bounds for Discounted Sequential Decision Processes , 1971 .

[14]  Louis A. Hageman,et al.  Iterative Solution of Large Linear Systems. , 1971 .

[15]  Thomas E. Morton Technical Note - On the Asymptotic Convergence Rate of Cost Differences for Markovian Decision Processes , 1971, Oper. Res..

[16]  Evan L. Porteus Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[17]  Andrew B. Whinston,et al.  Optimization over Leontief substitution systems , 1975 .

[18]  Steven A. Lippman,et al.  Applying a New Device in the Optimization of Exponential Queuing Systems , 1975, Oper. Res..

[19]  Jo van Nunen,et al.  A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..

[20]  V. Conrad,et al.  A faster SSOR algorythm , 1976 .

[21]  J. Meijerink,et al.  An iterative solution method for linear systems of which the coefficient matrix is a symmetric -matrix , 1977 .

[22]  T. Morton,et al.  Discounting, Ergodicity and Convergence for Markov Decision Processes , 1977 .

[23]  Evan L. Porteus,et al.  Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain , 1978, Oper. Res..

[24]  P. Schweitzer Contraction mappings underlying undiscounted Markov decision problems—II , 1978 .

[25]  Richard F. Serfozo,et al.  Technical Note - An Equivalence Between Continuous and Discrete Time Markov Decision Processes , 1979, Oper. Res..

[26]  P. Schweitzer,et al.  Geometric convergence of value-iteration in multichain Markov decision problems , 1979, Advances in Applied Probability.

[27]  Evan L. Porteus Improved iterative computation of the expected discounted return in Markov and semi-Markov chains , 1980, Z. Oper. Research.