论文信息 - A Hybrid Heuristic Value Iteration Algorithm for POMDP

A Hybrid Heuristic Value Iteration Algorithm for POMDP

Point-based value iteration methods are a class of effective algorithms for solving POMDP model. However, most of these algorithms explore the belief point set by single heuristic criterion, thus limit the effectiveness. A value iteration algorithm (HHVI) based on hybrid heuristic criteria for exploring belief points set is presented in the paper. HHVI maintains the upper and lower bounds on the value function, filters the belief points whose difference between upper and lower bounds on value function is less than the threshold, and explores the farthest belief point away from the explored point set. HHVI can improve the effect and efficiency of convergence by guaranteeing that the explored point set is effectively and fully distributed in the reachable belief space. Experiment results of four benchmarks show that HHVI can obtain better global optimal solution.

Feng Liu | Xin Jin | Xia Hua

[1] Jesse Hoey,et al. A Decision-Theoretic Approach to Task Assistance for Persons with Dementia , 2005, IJCAI.

[2] Kee-Eung Kim,et al. Closing the Gap: Improved Bounds on Optimal POMDP Solutions , 2011, ICAPS.

[3] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[4] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..

[5] Steve J. Young,et al. Partially observable Markov decision processes for spoken dialog systems , 2007, Comput. Speech Lang..

[6] Trey Smith,et al. Probabilistic planning for robotic exploration , 2007 .

[7] Guy Shani,et al. Noname manuscript No. (will be inserted by the editor) A Survey of Point-Based POMDP Solvers , 2022 .

[8] Joelle Pineau,et al. Point-based value iteration: An anytime algorithm for POMDPs , 2003, IJCAI.