A K-step look-ahead analysis of value iteration algorithms for Markov decision processes

Abstract We introduce and analyze a general look-ahead approach for Value Iteration Algorithms used in solving both discounted and undiscounted Markov decision processes. This approach, based on the value-oriented concept interwoven with multiple adaptive relaxation factors, leads to accelerating procedures which perform better than the separate use of either the concept of value oriented or of relaxation. Evaluation and computational considerations of this method are discussed, practical guidelines for implementation are suggested and the suitability of enhancing the method by incorporating Phase 0, Action Elimination procedures and Parallel Processing is indicated. The method was successfully applied to several real problems. We present some numerical results which support the superiority of the developed approach, particularly for undiscounted cases, over other Value Iteration variants.

[1]  Jo van Nunen,et al.  A set of successive approximation methods for discounted Markovian decision problems , 1976, Math. Methods Oper. Res..

[2]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[3]  Thomas E. Morton Technical Note - Undiscounted Markov Renewal Programming Via Modified Successive Approximations , 1971, Oper. Res..

[4]  M. Puterman,et al.  Modified Policy Iteration Algorithms for Discounted Markov Decision Problems , 1978 .

[5]  P. Schweitzer Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .

[6]  Harold J. Kushner,et al.  Accelerated procedures for the solution of discrete Markov control problems , 1971 .

[7]  Katsuhisa Ohno,et al.  Computing Optimal Policies for Controlled Tandem Queueing Systems , 1987, Oper. Res..

[8]  Henk Tijms,et al.  Stochastic modelling and analysis: a computational approach , 1986 .

[9]  Evan L. Porteus Some Bounds for Discounted Sequential Decision Processes , 1971 .

[10]  J. A. E. E. van Nunen Contracting Markov decision processes , 1976 .

[11]  Moshe Haviv,et al.  Truncated policy iteration methods , 1984 .

[12]  Uri Yechiali,et al.  Criteria for selecting the relaxation factor of the value iteration algorithm for undiscounted Markov and semi-Markov decision processes , 1991, Oper. Res. Lett..

[13]  Evan L. Porteus Bounds and Transformations for Discounted Finite Markov Decision Chains , 1975, Oper. Res..

[14]  J.A.E.E. van Nunen,et al.  The action elimination algorithm for Markov decision processes , 1976 .

[15]  Martin L. Puterman,et al.  Action Elimination Procedures for Modified Policy Iteration Algorithms , 1982, Oper. Res..

[16]  L. Thomas,et al.  Computational comparison of value iteration algorithms for discounted Markov decision processes , 1983 .

[17]  U. Yechiali,et al.  Accelerating Procedures of the Value Iteration Algorithm for Discounted Markov Decision Processes, Based on a One-Step Lookahead Analysis , 1994 .

[18]  Evan L. Porteus,et al.  Technical Note - Accelerated Computation of the Expected Discounted Return in a Markov Chain , 1978, Oper. Res..

[19]  David McMillan,et al.  State-dependent control of call arrivals in layered cellular mobile networks , 1993, Telecommun. Syst..

[20]  J. Popyack,et al.  Discrete versions of an algorithm due to Varaiya , 1979 .