Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error. We provide a general theorem showing that these search heuristics are admissible, and lead to complete and ε-optimal algorithms. This is, to the best of our knowledge, the strongest theoretical result available for online POMDP solution methods. We also provide empirical evidence showing that our approach is also practical, and can find (provably) near-optimal solutions in reasonable time.

[1]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[2]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[3]  P. Poupart Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[4]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[5]  J. Satia,et al.  Markovian Decision Processes with Probabilistic Observation of States , 1973 .

[6]  Nils J. Nilsson,et al.  Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[8]  Brahim Chaib-draa,et al.  AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs , 2007, IJCAI.

[9]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[10]  Hector Geffner,et al.  Solving Large POMDPs using Real Time Dynamic Programming , 1998 .

[11]  Richard Washington,et al.  BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.

[12]  Joelle Pineau,et al.  Tractable planning under uncertainty: exploiting structure , 2004 .

[13]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[14]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[15]  Nikos A. Vlassis,et al.  Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[16]  David A. McAllester,et al.  Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.