论文信息 - B^2RTDP: An Efficient Solution for Bounded-Parameter Markov Decision Process

B^2RTDP: An Efficient Solution for Bounded-Parameter Markov Decision Process

Bounded-parameter Markov decision process (BMDP) can be used to model sequential decision problems, where the transitions probabilities are not completely know and are given by intervals. One of the criteria used to solve that kind of problems is the maxim in, i.e., the best action on the worst scenario. The algorithms to solve BMDPs that use this approach include interval value iteration and an extension of real time dynamic programming (Robust-LRTDP). In this paper, we introduce a new algorithm, named B2RTDP, also based on real time dynamic programming that makes a different choice of the next state to be visited using upper and lower bounds of the optimal value function. The empirical evaluation of the algorithm shows that it converges faster than the state-of-the-art algorithms that solve BMDPs.

Karina Valdivia Delgado | Leliane Nunes de Barros | Fernando L. Fussuma | L. N. Barros | K. V. Delgado

[1] Reid G. Simmons,et al. Focused Real-Time Dynamic Programming for MDPs: Squeezing More Out of a Heuristic , 2006, AAAI.

[2] Blai Bonet,et al. Labeled RTDP: Improving the Convergence of Real-Time Dynamic Programming , 2003, ICAPS.

[3] John M Gozzolino,et al. MARKOVIAN DECISION PROCESSES WITH UNCERTAIN TRANSITION PROBABILITIES , 1965 .

[4] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[5] Scott Sanner,et al. Solutions to Factored MDPs with Imprecise Transition Probabilities 1 , 2011 .

[6] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[7] Scott Sanner,et al. Using mathematical programming to solve Factored Markov Decision Processes with Imprecise Probabilities , 2011, Int. J. Approx. Reason..

[8] Geoffrey J. Gordon,et al. Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees , 2005, ICML.

[9] Robert Givan,et al. Bounded-parameter Markov decision processes , 2000, Artif. Intell..

[10] Peter Norvig,et al. Artificial intelligence - a modern approach, 2nd Edition , 2003, Prentice Hall series in artificial intelligence.

[11] Peter Norvig,et al. Artificial Intelligence: A Modern Approach , 1995 .

[12] Olivier Buffet,et al. Robust Planning with (L)RTDP , 2005, IJCAI.