VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.

[1]  Craig Boutilier,et al.  Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[2]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[3]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4]  Eric A. Hansen,et al.  Solving POMDPs by Searching in Policy Space , 1998, UAI.

[5]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[6]  Zhengzhu Feng,et al.  Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[7]  Ronald E. Parr,et al.  Solving Factored POMDPs with Linear Value Functions , 2001 .

[8]  Carlos Guestrin,et al.  Max-norm Projections for Factored MDPs , 2001, IJCAI.

[9]  Zhengzhu Feng,et al.  Approximate Planning for Factored POMDPs , 2001 .

[10]  Douglas Aberdeen,et al.  Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[11]  Nicholas Roy,et al.  Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[12]  Jonathan Baxter,et al.  Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .

[13]  Craig Boutilier,et al.  Value-Directed Compression of POMDPs , 2002, NIPS.

[14]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[15]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[16]  Craig Boutilier,et al.  Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[17]  Nikos A. Vlassis,et al.  A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.