论文信息 - An 2-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

An 2-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

We present an 2-optimal grid based algorithm for pomdps that is tractable in 2, the discount factor and the maximum absolute value of the cost function, but exponential in the dimension of the state space. To the best of our knowledge, this is the first optimal grid-based algorithm for pomdps: all other optimal algorithms that we know are based on Sondik’s representation of the Value Function. We also propose a robustness criterion for grid-based algorithms and show that the new algorithm is robust in such sense.

Blai Bonet

[1] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[2] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3] William S. Lovejoy,et al. Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[4] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[5] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[6] John N. Tsitsiklis,et al. Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7] Bert Fristedt,et al. A modern approach to probability theory , 1996 .

[8] Michael L. Littman,et al. Algorithms for Sequential Decision Making , 1996 .

[9] Wenju Liu,et al. A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..

[10] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[11] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..