论文信息 - A value iteration method for undiscounted multichain Markov decision processes

A value iteration method for undiscounted multichain Markov decision processes

This paper proposes a value iteration method which finds anε-optimal policy of an undiscounted multichain Markov decision process in a finite number of iterations. The undiscounted multichain Markov decision process is reduced to an aggregated Markov decision process, which utilizes maximal gains of undiscounted Markov decision sub-processes and is formulated as an optimal stopping problem. As a preliminary, sufficient conditions are presented under which a policy isε-optimal.ZusammenfassungIn dieser Arbeit wird eine Wertiterationsmethode vorgeschlagen, die eineε-optimale Politik für einen undiskontierten nicht-irreduziblen Markovschen Entscheidungsprozeß (MEP) in endlichen vielen Schritten liefert. Der undiskontierte nicht-irreduzible MEP wird auf einen aggregierten MEP reduziert, der maximale Gewinn eines undiskontierten Sub-MEP verwendet und als optimales Stopp-Problem formuliert wird. Zu Beginn werden hinreichende Bedingungen für dieε-Optimalität einer Politik angegeben.

Katsuhisa Ohno | K. Ohno

[1] Thomas E. Morton. Technical Note - Undiscounted Markov Renewal Programming Via Modified Successive Approximations , 1971, Oper. Res..

[2] Paul J. Schweitzer,et al. The Asymptotic Behavior of Undiscounted Value Iteration in Markov Decision Problems , 1977, Math. Oper. Res..

[3] Paul J. Schweitzer,et al. Iterative bounds on the relative value vector in undiscounted Markov renewal programming , 1985, Z. Oper. Research.

[4] A. Hordijk,et al. Linear Programming and Markov Decision Chains , 1979 .

[5] L. C. M. Kallenberg,et al. Linear programming and finite Markovian control problems , 1984 .

[6] J. Wal. The method of value oriented successive approximations for the average reward Markov decision process , 1980 .

[7] E. Denardo,et al. Multichain Markov Renewal Programs , 1968 .

[8] A. F. Veinott. ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING , 1966 .

[9] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[10] E. Denardo. Markov Renewal Programs with Small Interest Rates , 1971 .

[11] P. Schweitzer. Iterative solution of the functional equations of undiscounted Markov renewal programming , 1971 .