A Hybrid Heuristic Value Iteration Algorithm for POMDP

Point-based value iteration methods are a class of effective algorithms for solving POMDP model. However, most of these algorithms explore the belief point set by single heuristic criterion, thus limit the effectiveness. A value iteration algorithm (HHVI) based on hybrid heuristic criteria for exploring belief points set is presented in the paper. HHVI maintains the upper and lower bounds on the value function, filters the belief points whose difference between upper and lower bounds on value function is less than the threshold, and explores the farthest belief point away from the explored point set. HHVI can improve the effect and efficiency of convergence by guaranteeing that the explored point set is effectively and fully distributed in the reachable belief space. Experiment results of four benchmarks show that HHVI can obtain better global optimal solution.