论文信息 - Monte-Carlo Expectation Maximization for Decentralized POMDPs - 字舞流文

Monte-Carlo Expectation Maximization for Decentralized POMDPs

We address two significant drawbacks of state-of-the-art solvers of decentralized POMDPs (DECPOMDPs): the reliance on complete knowledge of the model and limited scalability as the complexity of the domain grows. We extend a recently proposed approach for solving DEC-POMDPs via a reduction to the maximum likelihood problem, which in turn can be solved using EM. We introduce a model-free version of this approach that employs Monte-Carlo EM (MCEM). While a naive implementation of MCEM is inadequate in multiagent settings, we introduce several improvements in sampling that produce high-quality results on a variety of DEC-POMDP benchmarks, including large problems with thousands of agents.

Feng Wu | Nicholas R. Jennings | Shlomo Zilberstein | S. Zilberstein | N. Jennings | Feng Wu

[1] George A. Bekey,et al. On autonomous robots , 1998, The Knowledge Engineering Review.

[2] Ronald C. Neath,et al. On Convergence Properties of the Monte Carlo EM Algorithm , 2012, 1206.4768.

[3] Marc Toussaint,et al. Scalable Multiagent Planning Using Probabilistic Inference , 2011, IJCAI.

[4] Marc Toussaint,et al. Learning model-free robot control by a Monte Carlo EM algorithm , 2009, Auton. Robots.

[5] Xinhua Zhang,et al. Conditional random fields for multi-agent reinforcement learning , 2007, ICML '07.

[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[7] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[8] Jaakko Peltonen,et al. Periodic Finite State Controllers for Efficient POMDP and DEC-POMDP Planning , 2011, NIPS.

[9] Bikramjit Banerjee,et al. Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs , 2012, AAAI.

[10] Robert M Thrall,et al. Mathematics of Operations Research. , 1978 .

[11] Marc Toussaint,et al. Hierarchical POMDP Controller Optimization by Likelihood Maximization , 2008, UAI.

[12] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[13] Shlomo Zilberstein,et al. Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[14] Makoto Yokoo,et al. Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[15] Shlomo Zilberstein,et al. Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[16] Feng Wu,et al. Rollout Sampling Policy Iteration for Decentralized POMDPs , 2010, UAI.

[17] Shlomo Zilberstein,et al. Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[18] François Charpillet,et al. Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[19] François Charpillet,et al. MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[20] Makoto Yokoo,et al. Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[21] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .

[22] Marc Toussaint,et al. Probabilistic inference for solving discrete and continuous state Markov Decision Processes , 2006, ICML.

[23] G. C. Wei,et al. A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[24] Neil Immerman,et al. The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[25] Peter Kulchyski. and , 2015 .

[26] Marc Toussaint,et al. Model-free reinforcement learning as mixture learning , 2009, ICML '09.

[27] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[28] Frans A. Oliehoek,et al. Scaling Up Optimal Heuristic Search in Dec-POMDPs via Incremental Expansion , 2011, IJCAI.

[29] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[30] Jaakko Peltonen,et al. Efficient Planning for Factored Infinite-Horizon DEC-POMDPs , 2011, IJCAI.

[31] Brahim Chaib-draa,et al. Toward error-bounded algorithms for infinite-horizon DEC-POMDPs , 2011, AAMAS.

[32] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[33] Shlomo Zilberstein,et al. Anytime Planning for Decentralized POMDPs using Expectation Maximization , 2010, UAI.

[34] Victor R. Lesser,et al. Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs , 2011, AAAI.