论文信息 - A fast point-based algorithm for POMDPs

A fast point-based algorithm for POMDPs

We describe a point-based approximate value iteration algorithm for partially observable Markov decision processes. The algorithm performs value function updates ensuring that in each iteration the new value function is an upper bound to the previous value function, as estimated on a sampled set of belief points. A randomized belief-point selection scheme allows for fast update steps. Results indicate that the proposed algorithm achieves competitive performance, both in terms of solution quality as well as speed.

N. Vlassis | M. Spaan

[1] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .

[2] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..

[3] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[4] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[5] Leslie Pack Kaelbling,et al. Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[6] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[7] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[8] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[9] Eric A. Hansen,et al. An Improved Grid-Based Approximation Algorithm for POMDPs , 2001, IJCAI.

[10] N. Zhang,et al. Algorithms for partially observable markov decision processes , 2001 .

[11] Kin Man Poon,et al. A fast heuristic algorithm for decision-theoretic planning , 2001 .

[12] Nicholas Roy,et al. Exponential Family PCA for Belief Compression in POMDPs , 2002, NIPS.

[13] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.

[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.