Scalable Planning in Multi-Agent MDPs

Multi-agent Markov Decision Processes (MMDPs) arise in a variety of applications including target tracking, control of multi-robot swarms, and multiplayer games. A key challenge in MMDPs occurs when the state and action spaces grow exponentially in the number of agents, making computation of an optimal policy computationally intractable for mediumto large-scale problems. One property that has been exploited to mitigate this complexity is transition independence, in which each agent’s transition probabilities are independent of the states and actions of other agents. Transition independence enables factorization of the MMDP and computation of local agent policies but does not hold for arbitrary MMDPs. In this paper, we propose an approximate transition dependence property, called δ-transition dependence and develop a metric for quantifying how far an MMDP deviates from transition independence. Our definition of δ-transition dependence recovers transition independence as a special case when δ is zero. We develop a polynomial time algorithm in the number of agents that achieves a provable bound on the global optimum when the reward functions are monotone increasing and submodular in the agent actions. We evaluate our approach on two case studies, namely, multi-robot control and multi-agent patrolling example.

[1]  Dimitri P. Bertsekas,et al.  Dynamic Programming and Optimal Control, Two Volume Set , 1995 .

[2]  Jonathan P. How,et al.  Modeling and Planning with Macro-Actions in Decentralized POMDPs , 2019, J. Artif. Intell. Res..

[3]  S. Christian Albright,et al.  Structural Results for Partially Observable Markov Decision Processes , 1979, Oper. Res..

[4]  V. Climenhaga Markov chains and mixing times , 2013 .

[5]  Shobha Venkataraman,et al.  Efficient Solution Algorithms for Factored MDPs , 2003, J. Artif. Intell. Res..

[6]  Shimon Whiteson,et al.  Exploiting Submodular Value Functions for Faster Dynamic Sensor Selection , 2015, AAAI.

[7]  E. Seneta Sensitivity Analysis, Ergodicity Coefficients, and Rank-One Updates for Finite Markov Chains , 2021 .

[8]  Roy Schwartz,et al.  Online Submodular Maximization with Preemption , 2015, SODA.

[9]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[10]  J. Meyer The Role of the Group Generalized Inverse in the Theory of Finite Markov Chains , 1975 .

[11]  Martin Lauer,et al.  An Algorithm for Distributed Reinforcement Learning in Cooperative Multi-Agent Systems , 2000, ICML.

[12]  Andreas Krause,et al.  Streaming submodular maximization: massive data summarization on the fly , 2014, KDD.

[13]  Michael L. Littman,et al.  Value-function reinforcement learning in Markov games , 2001, Cognitive Systems Research.

[14]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[15]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[16]  Akshat Kumar,et al.  Successor Features Based Multi-Agent RL for Event-Based Decentralized MDPs , 2019, AAAI.

[17]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[18]  Guannan Qu,et al.  Exploiting Fast Decaying and Locality in Multi-Agent MDP with Tree Dependence Structure , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[19]  藤重 悟 Submodular functions and optimization , 1991 .

[20]  Michael Wooldridge,et al.  Game Theory and Decision Theory in Multi-Agent Systems , 2002, Autonomous Agents and Multi-Agent Systems.

[21]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[22]  Xiaofeng Wang,et al.  Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games , 2002, NIPS.

[23]  Pradeep Varakantham,et al.  Decentralized Planning in Stochastic Environments with Submodular Rewards , 2017, AAAI.

[24]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[25]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[26]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.