论文信息 - Time aggregated Markov decision processes via standard dynamic programming

Time aggregated Markov decision processes via standard dynamic programming

This note addresses the time aggregation approach to ergodic finite state Markov decision processes with uncontrollable states. We propose the use of the time aggregation approach as an intermediate step toward constructing a transformed MDP whose state space is comprised solely of the controllable states. The proposed approach simplifies the iterative search for the optimal solution by eliminating the need to define an equivalent parametric function, and results in a problem that can be solved by simpler, standard MDP algorithms.

Marcelo D. Fragoso | Edilson Fernandes de Arruda

[1] Peter B. Luh,et al. Incremental Value Iteration for Time-Aggregated Markov-Decision Processes , 2007, IEEE Transactions on Automatic Control.

[2] Zhiyuan Ren,et al. Markov decision Processes with fractional costs , 2005, IEEE Transactions on Automatic Control.

[3] E. Fainberg. Sufficient Classes of Strategies in Discrete Dynamic Programming I: Decomposition of Randomized Strategies and Embedded Models , 1987 .

[4] Rommert Dekker,et al. Joint replacement in an operational planning phase , 1996 .

[5] Marcelo D. Fragoso,et al. Standard dynamic programming applied to time aggregated Markov decision processes , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7] Adam Shwartz,et al. Exact finite approximations of average-cost countable Markov decision processes , 2007, Autom..

[8] Ronald A. Howard,et al. Dynamic Probabilistic Systems , 1971 .

[9] Zhiyuan Ren,et al. A time aggregation approach to Markov decision processes , 2002, Autom..