Heuristic Search Value Iteration for POMDPs

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some bench-mark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.

[1]  Karl Johan Åström,et al.  Optimal control of Markov processes with incomplete state information , 1965 .

[2]  Edward J. Sondik,et al.  The optimal control of par-tially observable Markov processes , 1971 .

[3]  Leslie Pack Kaelbling,et al.  Learning in embedded systems , 1993 .

[4]  Craig Boutilier,et al.  Integrating Planning and Execution in Stochastic Domains , 1994, UAI.

[5]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[6]  Richard Washington,et al.  BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.

[7]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[8]  Milos Hauskrecht,et al.  Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.

[9]  Ronen I. Brafman,et al.  A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.

[10]  Ronen I. Brafman,et al.  Structured Reachability Analysis for Markov Decision Processes , 1998, UAI.

[11]  Jesse Hoey,et al.  SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.

[12]  Milos Hauskrecht,et al.  Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[13]  Weihong Zhang,et al.  Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..

[14]  Shlomo Zilberstein,et al.  LAO*: A heuristic search algorithm that finds solutions with loops , 2001, Artif. Intell..

[15]  Kin Man Poon,et al.  A fast heuristic algorithm for decision-theoretic planning , 2001 .

[16]  Joelle Pineau,et al.  Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.