论文信息 - A K-step look-ahead analysis of value iteration algorithms for Markov decision processes

A K-step look-ahead analysis of value iteration algorithms for Markov decision processes

Abstract We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving both discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelerating procedures which perform better than the separate use of either the concept of value oriented or of relaxation. Evaluation and computational considerations of this method are discussed, practical guidelines for implementation are suggested and the suitability of enhancing the method by incorporating Phase 0, Action Elimination procedures and Parallel Processing is indicated. The method was successfully applied to several real problems. We present some numerical results which support the superiority of the developed approach, particularly for undiscounted cases, over other Value Iteration variants.

Uri Yechiali | Meir Herzberg

[1] Jo van Nunen,et al. A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..

[2] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3] Thomas E. Morton. Technical Note - Undiscounted Markov Renewal Programming Via Modified Successive Approximations , 1971, Oper. Res..

[4] M. Puterman,et al. Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[5] P. Schweitzer. Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .

[6] Harold J. Kushner,et al. Accelerated procedures for the solution of discrete Markov control problems , 1971 .

[7] Katsuhisa Ohno,et al. Computing Optimal Policies for Controlled Tandem Queueing Systems , 1987, Oper. Res..

[8] Henk Tijms,et al. Stochastic modelling and analysis: a computational approach , 1986 .

[9] Evan L. Porteus. Some Bounds for Discounted Sequential Decision Processes , 1971 .

[10] J. A. E. E. van Nunen. Contracting Markov decision processes , 1976 .

[11] Moshe Haviv,et al. Truncated policy iteration methods , 1984 .