论文信息 - Computing Approximate Solutions to Markov Renewal Programs with Continuous State Spaces

Computing Approximate Solutions to Markov Renewal Programs with Continuous State Spaces

Value iteration and policy iteration are two well known computational methods for solving Markov renewal decision processes. Value iteration converges linearly, while policy iteration (typically) converges quadratically and is therefore more attractive in principle. However, when the state space is very large (or continuous), the latter asks for solving at each iteration a large linear system (or integral equation) and becomes unpractical. We propose an “approximate policy iteration” method, targeted especially to systems with continuous or large state spaces, for which the Bellman (expected cost-to-go) function is relatively smooth (or piecewise smooth). These systems occur quite frequently in practice. The method is based on an approximation of the Bellman function by a linear combination of an a priori fixed set of base functions. At each policy iteration, we build a linear system in terms of the coecients of these base functions, and solve this system approximately. We give special attention to a particular case of finite element approximation where the Bellman function is expressed directly as a convex combination of its values at a finite set of grid points. In the first part of the paper, we survey and extend slightly some basic results concerning convergence, approximation, and bounds. All along the paper, we consider both the discounted and average cost criteria. Our models are infinite horizon and stationary.

P. L’Ecuyer

[1] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1963 .

[2] E. Denardo. CONTRACTION MAPPINGS IN THE THEORY UNDERLYING DYNAMIC PROGRAMMING , 1967 .

[3] S. Ross. Average cost semi-markov decision processes , 1970, Journal of Applied Probability.

[4] P. Schweitzer. Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .

[5] Thomas E. Morton. Technical Note - Undiscounted Markov Renewal Programming Via Modified Successive Approximations , 1971, Oper. Res..

[6] B. Fox. Discretizing dynamic programs , 1973 .

[7] Evan L. Porteus. Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[8] D. Bertsekas. Convergence of discretization procedures in dynamic programming , 1975 .

[9] James W. Daniel,et al. Splines and efficiency in dynamic programming , 1976 .

[10] T. Morton,et al. Discounting, Ergodicity and Convergence for Markov Decision Processes , 1977 .

[11] Loren Platzman,et al. Technical Note - Improved Conditions for Convergence in Undiscounted Markov Renewal Programming , 1977, Oper. Res..

[12] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[13] Ward Whitt,et al. Approximations of Dynamic Programs, I , 1978, Math. Oper. Res..

[14] Thomas L. Morin,et al. COMPUTATIONAL ADVANCES IN DYNAMIC PROGRAMMING , 1978 .

[15] Martin L. Puterman,et al. On the Convergence of Policy Iteration in Stationary Dynamic Programming , 1979, Math. Oper. Res..

[16] Ward Whitt,et al. Approximations of Dynamic Programs, II , 1979, Math. Oper. Res..

[17] Awi Federgruen,et al. A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs , 1980 .

[18] Raymond Rishel,et al. Group preventive maintenance An example of controlled jump processes , 1981, 1981 20th IEEE Conference on Decision and Control including the Symposium on Adaptive Processes.

[19] Evan L. Porteus. Computing the discounted return in markov and semi‐markov chains , 1981 .

[20] P. L’Ecuyer,et al. A stochastic control approach to group preventive replacement in a multicomponent system , 1982 .

[21] P. L'Ecuyer,et al. Approximation and bounds in discrete event dynamic programming , 1983, The 23rd IEEE Conference on Decision and Control.

[22] Paul J. Schweitzer,et al. On the existence of relative values for undiscounted Markovian decision processes with a scalar gain rate , 1984 .

[23] P. Schweitzer,et al. Generalized polynomial approximations in Markovian decision processes , 1985 .

[24] Pierre L'Ecuyer,et al. The Repair VS. Replacement problem: A stochastic control approach , 1986 .

[25] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .

[26] William L. Briggs,et al. A multigrid tutorial , 1987 .

[27] Pierre L'Ecuyer,et al. Discrete Event Dynamic Programming with Simultaneous Events , 1988, Math. Oper. Res..

[28] Jacques Malenfant,et al. Computing Optimal Checkpointing Strategies for Rollback and Recovery Systems , 1988, IEEE Trans. Computers.

[29] Maarouf Saad,et al. Application of Principal-Component Analysis to Long-Term Reservoir Management , 1988 .

[30] D. Bertsekas,et al. Adaptive aggregation methods for infinite horizon dynamic programming , 1989 .

[31] P. L. Ecuyer,et al. Noncooperative stochastic games under a n-stage local contraction assumption , 1989 .

[32] Moshe Dror,et al. Dynamic Scheduling of a Robot Servicing Machines on a One-Dimensional Line , 1991 .