论文信息 - Forward Search Value Iteration for POMDPs

Forward Search Value Iteration for POMDPs

Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly converge to an approximate solution formedium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space towards rewards, finding sequences of useful backups, and show how it scales up better than HSVI on larger benchmarks.

[1] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[2] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[3] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[4] Hector Geffner,et al. Solving Large POMDPs using Real Time Dynamic Programming , 1998 .

[5] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[6] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[7] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[8] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[9] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.