Interaction-driven Markov games for decentralized multiagent planning under uncertainty

In this paper we propose interaction-driven Markov games (IDMGs), a new model for multiagent decision making under uncertainty. IDMGs aim at describing multiagent decision problems in which interaction among agents is a local phenomenon. To this purpose, we explicitly distinguish between situations in which agents should interact and situations in which they can afford to act independently. The agents are coupled through the joint rewards and joint transitions in the states in which they interact. The model combines several fundamental properties from transition-independent Dec-MDPs and weakly coupled MDPs while allowing to address, in several aspects, more general problems. We introduce a fast approximate solution method for planning in IDMGs, exploiting their particular structure, and we illustrate its successful application on several large multiagent tasks.

[1]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[2]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[3]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[4]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[5]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[6]  Kee-Eung Kim,et al.  Solving Very Large Weakly Coupled Markov Decision Processes , 1998, AAAI/IAAI.

[7]  Nikos A. Vlassis,et al.  Utile Coordination: Learning Interdependencies Among Cooperative Agents , 2005, CIG.

[8]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[9]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[10]  Yoav Shoham,et al.  Simple search methods for finding a Nash equilibrium , 2004, Games Econ. Behav..

[11]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[12]  Craig Boutilier,et al.  Implicit Imitation in Multiagent Reinforcement Learning , 1999, ICML.

[13]  P. J. Gmytrasiewicz,et al.  A Framework for Sequential Planning in Multi-Agent Settings , 2005, AI&M.

[14]  Michael P. Wellman,et al.  Nash Q-Learning for General-Sum Stochastic Games , 2003, J. Mach. Learn. Res..

[15]  A. Cassandra,et al.  Exact and approximate algorithms for partially observable markov decision processes , 1998 .

[16]  Manuela Veloso,et al.  Decentralized Communication Strategies for Coordinated Multi-Agent Policies , 2005 .

[17]  Satinder P. Singh,et al.  How to Dynamically Merge Markov Decision Processes , 1997, NIPS.

[18]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[19]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[20]  H. W. Kuhn,et al.  11. Extensive Games and the Problem of Information , 1953 .

[21]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..