Accelerating Point-Based POMDP Algorithms through Successive Approximations of the Optimal Reachable Space

Point-based approximation algorithms have drastically improved the speed of POMDP planning. This paper presents a new point-based POMDP algorithm called SARSOP. Like earlier point-based algorithms, SARSOP performs value iteration at a set of sampled belief points; however, it focuses on sampling near the space reachable from an initial belief point under the optimal policy. Since neither the optimal policy nor the optimal reachable space is known in advance, SARSOP builds successive approximations to it through sampling and pruning. In our experiments, the new algorithm solved difficult POMDP problems with more than 10,000 states. Its running time is competitive with the fastest existing pointbased algorithm on most problems and faster by many times on some. Our approach is complementary to existing pointbased algorithms and can be integrated with them to improve their performance.

[1]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[2]  Hector Geffner,et al.  Solving Large POMDPs using Real Time Dynamic Programming , 1998 .

[3]  Geoffrey J. Gordon,et al.  Finding Approximate POMDP solutions Through Belief Compression , 2011, J. Artif. Intell. Res..

[4]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[5]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[6]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[7]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[8]  John N. Tsitsiklis,et al.  The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[9]  Jean-Claude Latombe,et al.  On the Probabilistic Foundations of Probabilistic Roadmap Planning , 2006, Int. J. Robotics Res..

[10]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[11]  William S. Lovejoy,et al.  Computationally Feasible Bounds for Partially Observed Markov Decision Processes , 1991, Oper. Res..

[12]  Howie Choset,et al.  Principles of Robot Motion: Theory, Algorithms, and Implementation ERRATA!!!! 1 , 2007 .

[13]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[14]  Edward J. Sondik,et al.  The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..