论文信息 - Improving Point-Based POMDP Policies at Run-Time

Improving Point-Based POMDP Policies at Run-Time

Point-based algorithms have been widely used for computing approximate solutions for POMDPs. While they work well in many cases, they can perform very poorly if the current belief state at run time has not been well sampled. In this paper we proposed several heuristic functions for estimating when offline approximate policies are likely to perform poorly at the current belief point. We show that plan repair by incrementally improving the policy can substantially improve overall performance. These approaches are particularly useful in domains where there are large numbers of outcomes for each action, so many points would be required to cover the reachable state space. A common example is fault recovery problems where there are large numbers of possible faults, each of which has a low probability of occurring and recovery actions must be executed to remove the faults. We demonstrate the approach by adding plan repair to SARSOP, a state-ofart point-based value iteration algorithm, and show that it can considerably improve performance for some domains, with minimal computational cost.

R. Dearden | Minlue Wang

[1] Edward J. Sondik,et al. The optimal control of par-tially observable Markov processes , 1971 .

[2] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[3] A. Cassandra,et al. Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[4] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..

[5] Reid G. Simmons,et al. Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[6] Nikos A. Vlassis,et al. Perseus: Randomized Point-based Value Iteration for POMDPs , 2005, J. Artif. Intell. Res..

[7] Guy Shani,et al. Adaptation for Changing Stochastic Environments through Online POMDP Policy Learning , 2005 .

[8] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[9] Joelle Pineau,et al. Anytime Point-Based Approximations for Large POMDPs , 2006, J. Artif. Intell. Res..

[10] Brahim Chaib-draa,et al. AEMS: An Anytime Online Search Algorithm for Approximate Policy Refinement in Large POMDPs , 2007, IJCAI.

[11] Joelle Pineau,et al. Online Planning Algorithms for POMDPs , 2008, J. Artif. Intell. Res..

[12] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.