论文信息 - Escaping local optima in POMDP planning as inference

Escaping local optima in POMDP planning as inference

Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We propose two algorithms: the first one adds nodes to the controller according to an increasingly deep forward search, while the second one splits nodes in a greedy fashion to improve reward likelihood.

Marc Toussaint | Pascal Poupart | Tobias Lang

[1] Andrew W. Moore,et al. Fast State Discovery for HMM Model Selection and Learning , 2007, AISTATS.

[2] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[3] Marc Toussaint,et al. Probabilistic inference for solving (PO) MDPs , 2006 .

[4] Shlomo Zilberstein,et al. Solving POMDPs using quadratically constrained linear programs , 2006, AAMAS '06.

[5] Craig Boutilier,et al. Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[6] P. Poupart. Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .