An 2-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

We present an 2-optimal grid based algorithm for pomdps that is tractable in 2, the discount factor and the maximum absolute value of the cost function, but exponential in the dimension of the state space. To the best of our knowledge, this is the first optimal grid-based algorithm for pomdps: all other optimal algorithms that we know are based on Sondik’s representation of the Value Function. We also propose a robustness criterion for grid-based algorithms and show that the new algorithm is robust in such sense.

[1]  E. J. Sondik,et al.  The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[2]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[3]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[4]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[5]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[6]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[7]  Bert Fristedt,et al.  A modern approach to probability theory , 1996 .

[8]  Michael L. Littman,et al.  Algorithms for Sequential Decision Making , 1996 .

[9]  Wenju Liu,et al.  A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains , 1997, J. Artif. Intell. Res..

[10]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[11]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[12]  Blai Bonet,et al.  Learning Sorting and Decision Trees with POMDPs , 1998, ICML.

[13]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[14]  Stephen S. Lee,et al.  Planning with Partially Observable Markov Decision Processes: Advances in Exact Solution Method , 1998, UAI.

[15]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[16]  Judy Goldsmith,et al.  Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.

[17]  Sebastian Thrun,et al.  Probabilistic Algorithms in Robotics , 2000, AI Mag..

[18]  Blai Bonet,et al.  Planning with Incomplete Information as Heuristic Search in Belief Space , 2000, AIPS.

[19]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..