论文信息 - VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

VDCBPI: an Approximate Scalable Algorithm for Large POMDPs

Existing algorithms for discrete partially observable Markov decision processes can at best solve problems of a few thousand states due to two important sources of intractability: the curse of dimensionality and the policy space complexity. This paper describes a new algorithm (VDCBPI) that mitigates both sources of intractability by combining the Value Directed Compression (VDC) technique [13] with Bounded Policy Iteration (BPI) [14]. The scalability of VDCBPI is demonstrated on synthetic network management problems with up to 33 million states.

Craig Boutilier | Pascal Poupart | Craig Boutilier | P. Poupart

[1] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.

[2] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[3] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[4] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.

[5] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[6] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.

[7] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .

[8] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.

[9] Zhengzhu Feng,et al. Approximate Planning for Factored POMDPs , 2001 .

[10] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.

[11] Nicholas Roy,et al. Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[12] Jonathan Baxter,et al. Scaling Internal-State Policy-Gradient Methods for POMDPs , 2002 .

[13] Craig Boutilier,et al. Value-Directed Compression of POMDPs , 2002, NIPS.

[14] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[15] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[16] Craig Boutilier,et al. Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[17] Nikos A. Vlassis,et al. A point-based POMDP algorithm for robot planning , 2004, IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA '04. 2004.