Towards Efficient Computation of Error Bounded Solutions in POMDPs: Expected Value Approximation and Dynamic Disjunctive Beliefs

While POMDPs (partially observable markov decision problems) are a popular computational model with wide-ranging applications, the computational cost for optimal policy generation is prohibitive. Researchers are investigating ever-more efficient algorithms, yet many applications demand such algorithms bound any loss in policy quality when chasing efficiency. To address this challenge, we present two new techniques. The first approximates in the value space to obtain solutions efficiently for a pre-specified error bound. Unlike existing techniques, our technique guarantees the resulting policy will meet this bound. Furthermore, it does not require costly computations to determine the quality loss of the policy. Our second technique prunes large tracts of belief space that are unreachable, allowing faster policy computation without any sacrifice in optimality. The combination of the two techniques, which are complementary to existing optimal policy generation algorithms, provides solutions with tight error bounds efficiently in domains where competing algorithms fail to provide such tight bounds.

[1]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[2]  Eric A. Hansen,et al.  An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[3]  Milind Tambe,et al.  Role allocation and reallocation in multiagent teams: towards a practical analysis , 2003, AAMAS '03.

[4]  Craig Boutilier,et al.  VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[5]  Milos Hauskrecht,et al.  Planning treatment of ischemic heart disease with partially observable Markov decision processes , 2000, Artif. Intell. Medicine.

[6]  Shlomo Zilberstein,et al.  Region-Based Incremental Pruning for POMDPs , 2004, UAI.

[7]  Milind Tambe,et al.  Exploiting belief bounds: practical POMDPs for personal assistant agents , 2005, AAMAS '05.

[8]  Shlomo Zilberstein,et al.  Efficient Maximization in Solving POMDPs , 2005, AAAI.

[9]  Reid G. Simmons,et al.  Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[10]  Kee-Eung Kim,et al.  Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.

[11]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[12]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[13]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[14]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.