A value iteration method for undiscounted multichain Markov decision processes

This paper proposes a value iteration method which finds anε-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy isε-optimal.ZusammenfassungIn dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eineε-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für dieε-Optimalität einer Politik angegeben.

[1]  Thomas E. Morton Technical Note - Undiscounted Markov Renewal Programming Via Modified Successive Approximations , 1971, Oper. Res..

[2]  Paul J. Schweitzer,et al.  The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems , 1977, Math. Oper. Res..

[3]  Paul J. Schweitzer,et al.  Iterative bounds on the relative value vector in undiscounted Markov renewal programming , 1985, Z. Oper. Research.

[4]  A. Hordijk,et al.  Linear Programming and Markov Decision Chains , 1979 .

[5]  L. C. M. Kallenberg,et al.  Linear programming and finite Markovian control problems , 1984 .

[6]  J. Wal The method of value oriented successive approximations for the average reward Markov decision process , 1980 .

[7]  E. Denardo,et al.  Multichain Markov Renewal Programs , 1968 .

[8]  A. F. Veinott ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[9]  Ronald A. Howard,et al.  Dynamic Programming and Markov Processes , 1960 .

[10]  E. Denardo Markov Renewal Programs with Small Interest Rates , 1971 .

[11]  P. Schweitzer Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .

[12]  Loren Platzman,et al.  Technical Note - Improved Conditions for Convergence in Undiscounted Markov Renewal Programming , 1977, Oper. Res..

[13]  E. Denardo A Markov Decision Problem , 1973 .

[14]  J. Bather Optimal decision procedures for finite Markov chains. Part III: General convex systems , 1973 .

[15]  Katsuhisa Ohno,et al.  Computing Optimal Policies for Controlled Tandem Queueing Systems , 1987, Oper. Res..

[16]  W. Barry On the Iterative Method of Dynamic Programming on a Finite Space Discrete Time Markov Process , 1965 .

[17]  Awi Federgruen,et al.  A New Specification of the Multichain Policy Iteration Algorithm in Undiscounted Markov Renewal Programs , 1980 .

[18]  J. Bather Optimal decision procedures for finite markov chains. Part I: Examples , 1973, Advances in Applied Probability.

[19]  P. Schweitzer,et al.  Geometric convergence of value-iteration in multichain Markov decision problems , 1979, Advances in Applied Probability.

[20]  Amedeo R. Odoni,et al.  On Finding the Maximal Gain for Markov Decision Processes , 1969, Oper. Res..

[21]  Peter Whittle,et al.  Optimization Over Time , 1982 .

[22]  Cyrus Derman,et al.  Finite State Markovian Decision Processes , 1970 .

[23]  D. White,et al.  Dynamic programming, Markov chains, and the method of successive approximations , 1963 .

[24]  Paul J. Schweitzer,et al.  A value-iteration scheme for undiscounted multichain Markov renewal programs , 1984, Z. Oper. Research.

[25]  Dieter Spreen,et al.  A further anticycling rule in multichain policy iteration for undiscounted Markov renewal programs , 1981, Z. Oper. Research.