Improving Point-Based POMDP Policies at Run-Time

Point-based algorithms have been widely used for computing approximate solutions for POMDPs. While they work well in many cases, they can perform very poorly if the current belief state at run time has not been well sampled. In this paper we proposed several heuristic functions for estimating when offline approximate policies are likely to perform poorly at the current belief point. We show that plan repair by incrementally improving the policy can substantially improve overall performance. These approaches are particularly useful in domains where there are large numbers of outcomes for each action, so many points would be required to cover the reachable state space. A common example is fault recovery problems where there are large numbers of possible faults, each of which has a low probability of occurring and recovery actions must be executed to remove the faults. We demonstrate the approach by adding plan repair to SARSOP, a state-ofart point-based value iteration algorithm, and show that it can considerably improve performance for some domains, with minimal computational cost.