论文信息 - Point-Based POMDP Algorithms: Improved Analysis and Implementation

Point-Based POMDP Algorithms: Improved Analysis and Implementation

Existing complexity bounds for point-based POMDP value iteration algorithms focus either on the curse of dimensionality or the curse of history. We derive a new bound that relies on both and uses the concept of discounted reachability; our conclusions may help guide future algorithm design. We also discuss recent improvements to our (point-based) heuristic search value iteration algorithm. Our new implementation calculates tighter initial bounds, avoids solving linear programs, and makes more effective use of sparsity.

Reid G. Simmons | Trey Smith | R. Simmons | Trey Smith

[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .

[2] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[3] Jack J. Dongarra,et al. An extended set of FORTRAN basic linear algebra subprograms , 1988, TOMS.

[4] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[5] Michael L. Littman,et al. Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[6] Milos Hauskrecht,et al. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.

[7] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[8] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[9] Kin Man Poon,et al. A fast heuristic algorithm for decision-theoretic planning , 2001 .

[10] Nicholas Roy,et al. Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[11] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[12] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[13] Craig Boutilier,et al. VDCBPI: an Approximate Scalable Algorithm for Large POMDPs , 2004, NIPS.

[14] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[15] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[16] Rémi Munos,et al. Error Bounds for Approximate Value Iteration , 2005, AAAI.