The Cross-Entropy Method for Policy Search in Decentralized POMDPs

Decentralized POMDPs (Dec-POMDPs) are becoming increasingly popular as models for multiagent planning under uncertainty, but solving a Dec-POMDP exactly is known to be an intractable combinatorial optimization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an appropriately parametrized stochastic policy, and then evaluates these policies either exactly or approximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that combinatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs. Povzetek: Prispevek opisuje novo metodo multiagentnega naˇ crtovanja.

[1]  Makoto Yokoo,et al.  Letting loose a SPIDER on a network of POMDPs: generating quality guaranteed policies , 2007, AAMAS '07.

[2]  Avraham Shtub,et al.  Managing Stochastic, Finite Capacity, Multi-Project Systems through the Cross-Entropy Methodology , 2005, Ann. Oper. Res..

[3]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[4]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[5]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[6]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[7]  Dirk P. Kroese,et al.  Global likelihood optimization via the cross-entropy method with an application to mixture models , 2004, Proceedings of the 2004 Winter Simulation Conference, 2004..

[8]  S. Zilberstein,et al.  Bounded Dynamic Programming for Decentralized POMDPs , 2007 .

[9]  Shie Mannor,et al.  The Cross Entropy Method for Fast Policy Search , 2003, ICML.

[10]  S. Zilberstein,et al.  Optimal Fixed-Size Controllers for Decentralized POMDPs , 2006 .

[11]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[12]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[13]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[14]  G. W. Wornell,et al.  Decentralized control of a multiple access broadcast channel: performance bounds , 1996, Proceedings of 35th IEEE Conference on Decision and Control.

[15]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[16]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[17]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[18]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[19]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[20]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[21]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[22]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[23]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[24]  Nikos A. Vlassis,et al.  A Cross-Entropy Approach to Solving Dec-POMDPs , 2007, IDC.

[25]  Carlos Guestrin,et al.  Multiagent Planning with Factored MDPs , 2001, NIPS.

[26]  François Charpillet,et al.  Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[27]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[28]  Dirk P. Kroese,et al.  Application of the Cross-Entropy Method to the Buffer Allocation Problem in a Simulation-Based Environment , 2005, Ann. Oper. Res..

[29]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[30]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[31]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[32]  Ariel Rubinstein,et al.  A Course in Game Theory , 1995 .

[33]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[34]  Nikos A. Vlassis,et al.  Using the Max-Plus Algorithm for Multiagent Decision Making in Coordination Graphs , 2005, BNAIC.

[35]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[36]  Makoto Yokoo,et al.  Exploiting Locality of Interaction in Networked Distributed POMDPs , 2006, AAAI Spring Symposium: Distributed Plan and Schedule Management.

[37]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[38]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..