A multi-criteria value iteration algorithm for POMDP problems

Point-based value iteration algorithms have been deeply studied for solving POMDP problems. However, most of these algorithms explore the belief point set only by single heuristic criterion, thus limit the effectiveness. A novel value iteration algorithm (MCVI) based on multi-criteria for exploring belief point set is presented in the paper. MCVI filters the belief points on which the interval between upper and lower bounds of value function is less than the threshold, and then explores the successor belief point which is farthest away from the explored belief point set. MCVI can improve the effect and efficiency of convergence by guaranteeing that the explored point set is effective and fully distributed in the reachable belief space. Experiment results of four benchmarks show that MCVI can obtain better global optimal solution.