论文信息 - Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Analyzing and Escaping Local Optima in Planning as Inference for Partially Observable Domains

Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We investigate the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM). We show that EM converges to controllers that are optimal with respect to a one-step lookahead. To escape local optima, we propose two algorithms: the first one adds nodes to the controller to ensure optimality with respect to a multi-step lookahead, while the second one splits nodes in a greedy fashion to improve reward likelihood. The approaches are demonstrated empirically on benchmark problems.

Marc Toussaint | Pascal Poupart | Tobias Lang

[1] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[2] David Hsu,et al. SARSOP: Efficient Point-Based POMDP Planning by Approximating Optimally Reachable Belief Spaces , 2008, Robotics: Science and Systems.

[3] P. Poupart. Exploiting structure to efficiently solve large scale partially observable Markov decision processes , 2005 .

[4] Craig Boutilier,et al. Stochastic Local Search for POMDP Controllers , 2004, AAAI.

[5] Marc Toussaint,et al. Probabilistic inference for solving (PO) MDPs , 2006 .

[6] Shlomo Zilberstein,et al. Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[7] Reid G. Simmons,et al. Point-Based POMDP Algorithms: Improved Analysis and Implementation , 2005, UAI.

[8] Craig Boutilier,et al. Bounded Finite State Controllers , 2003, NIPS.

[9] Marc Toussaint,et al. Model-free reinforcement learning as mixture learning , 2009, ICML '09.

[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.