Bounding reward measures of Markov models using the Markov decision processes

SUMMARY For a Markov reward process, where upper and lower bounds for the transition rates and rewards are known, a new approach to bound the expected reward is presented. Based on a previous paper where sharp bounds have been defined for the problem, but only an inefficient and unstable algorithm is proposed, this paper presents algorithms to compute the bounds by interpreting the problem as a Markov Decision Process. In this way, the well known value and policy iteration algorithms can be adopted to compute reward bounds in a stable and fairly efficient way. Different numerical techniques are presented for computing the reward bounds. Copyright © 2011 John Wiley & Sons, Ltd.

[1]  Henk A. van der Vorst,et al.  Bi-CGSTAB: A Fast and Smoothly Converging Variant of Bi-CG for the Solution of Nonsymmetric Linear Systems , 1992, SIAM J. Sci. Comput..

[2]  Langford B. White,et al.  A New Policy Evaluation Algorithm for Markov Decision Processes with Quasi Birth-Death Structure , 2005 .

[3]  G. Franceschinis,et al.  Bounds for Quasi-Lumpable Markow Chains , 1994, Perform. Evaluation.

[4]  John C. S. Lui,et al.  Computing bounds on steady state availability of repairable computer systems , 1994, JACM.

[5]  Chris Blondia,et al.  A policy iteration algorithm for Markov decision processes skip-free in one direction , 2007, Valuetools 2007.

[6]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[7]  Pierre Semal,et al.  Bounds for the Positive Eigenvectors of Nonnegative Matrices and for their Approximations by Decomposition , 1984, JACM.

[8]  P. Buchholz Exact and ordinary lumpability in finite Markov chains , 1994, Journal of Applied Probability.

[9]  Pierre Semal Refinable Bounds for Large Markov Chains , 1995, IEEE Trans. Computers.

[10]  Robert Givan,et al.  Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[11]  Joost-Pieter Katoen,et al.  Three-Valued Abstraction for Continuous-Time Markov Chains , 2007, CAV.

[12]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[13]  H. Walker,et al.  GMRES On (Nearly) Singular Systems , 1997, SIAM J. Matrix Anal. Appl..