Online planning for multi-agent systems with bounded communication

We propose an online algorithm for planning under uncertainty in multi-agent settings modeled as DEC-POMDPs. The algorithm helps overcome the high computational complexity of solving such problems offline. The key challenges in decentralized operation are to maintain coordinated behavior with little or no communication and, when communication is allowed, to optimize value with minimal communication. The algorithm addresses these challenges by generating identical conditional plans based on common knowledge and communicating only when history inconsistency is detected, allowing communication to be postponed when necessary. To be suitable for online operation, the algorithm computes good local policies using a new and fast local search method implemented using linear programming. Moreover, it bounds the amount of memory used at each step and can be applied to problems with arbitrary horizons. The experimental results confirm that the algorithm can solve problems that are too large for the best existing offline planning algorithms and it outperforms the best online method, producing much higher value with much less communication in most cases. The algorithm also proves to be effective when the communication channel is imperfect (periodically unavailable). These results contribute to the scalability of decision-theoretic planning in multi-agent settings.

[1]  Neil Immerman,et al.  The Complexity of Decentralized Control of Markov Decision Processes , 2000, UAI.

[2]  Victor R. Lesser,et al.  Analyzing myopic approaches for multi-agent communication , 2005, IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

[3]  V. Kaul,et al.  Planning , 2012 .

[4]  Claudia V. Goldman,et al.  Learning to communicate in a decentralized environment , 2007, Autonomous Agents and Multi-Agent Systems.

[5]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[6]  Manuela Veloso,et al.  Execution-time communication decisions for coordination of multi-agent teams , 2007 .

[7]  Claudia V. Goldman,et al.  Solving Transition Independent Decentralized Markov Decision Processes , 2004, J. Artif. Intell. Res..

[8]  Victor R. Lesser,et al.  Communication decisions in multi-agent cooperation: model and experiments , 2001, AGENTS '01.

[9]  Shlomo Zilberstein,et al.  Memory-Bounded Dynamic Programming for DEC-POMDPs , 2007, IJCAI.

[10]  Nicholas R. Jennings,et al.  Reward shaping for valuing communications during multi-agent coordination , 2009, AAMAS.

[11]  Nikos A. Vlassis,et al.  Multiagent Planning Under Uncertainty with Stochastic Communication Delays , 2008, ICAPS.

[12]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[13]  Abdel-Illah Mouaddib,et al.  An Iterative Algorithm for Solving Constrained Decentralized Markov Decision Processes , 2006, AAAI.

[14]  Milind Tambe,et al.  Towards Flexible Teamwork , 1997, J. Artif. Intell. Res..

[15]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[16]  Nicholas R. Jennings,et al.  A principled information valuation for communications during multi-agent coordination , 2008 .

[17]  Jeff G. Schneider,et al.  Game Theoretic Control for Robot Teams , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[18]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[19]  J. S. Albus Task decomposition , 1993, Proceedings of 8th IEEE International Symposium on Intelligent Control.

[20]  François Charpillet,et al.  Point-based Dynamic Programming for DEC-POMDPs , 2006, AAAI.

[21]  Shlomo Zilberstein,et al.  Policy Iteration for Decentralized Control of Markov Decision Processes , 2009, J. Artif. Intell. Res..

[22]  Shlomo Zilberstein,et al.  Achieving goals in decentralized POMDPs , 2009, AAMAS.

[23]  Shlomo Zilberstein,et al.  Incremental Policy Generation for Finite-Horizon DEC-POMDPs , 2009, ICAPS.

[24]  William Whittaker,et al.  Recent developments in subterranean robotics , 2006, J. Field Robotics.

[25]  Shlomo Zilberstein,et al.  Improved Memory-Bounded Dynamic Programming for Decentralized POMDPs , 2007, UAI.

[26]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[27]  Frans A. Oliehoek,et al.  Dec-POMDPs with delayed communication , 2007 .

[28]  K. Khalil On the Complexity of Decentralized Decision Making and Detection Problems , 2022 .

[29]  Shlomo Zilberstein,et al.  Value-based observation compression for DEC-POMDPs , 2008, AAMAS.

[30]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[31]  Shlomo Zilberstein,et al.  Optimizing Memory-Bounded Controllers for Decentralized POMDPs , 2007, UAI.

[32]  Brahim Chaib-draa,et al.  Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs , 2009, AAMAS.

[33]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[34]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[35]  Manuela M. Veloso,et al.  Task Decomposition, Dynamic Role Assignment, and Low-Bandwidth Communication for Real-Time Strategic Teamwork , 1999, Artif. Intell..

[36]  Milind Tambe,et al.  On opportunistic techniques for solving decentralized Markov decision processes with temporal constraints , 2007, AAMAS '07.

[37]  Shlomo Zilberstein,et al.  Formal models and algorithms for decentralized decision making under uncertainty , 2008, Autonomous Agents and Multi-Agent Systems.

[38]  Manuela M. Veloso,et al.  Reasoning about joint beliefs for execution-time communication decisions , 2005, AAMAS '05.

[39]  Francisco S. Melo,et al.  Interaction-driven Markov games for decentralized multiagent planning under uncertainty , 2008, AAMAS.

[40]  Shlomo Zilberstein,et al.  Myopic and Non-myopic Communication under Partial Observability , 2009, 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology.

[41]  Nikos A. Vlassis,et al.  Optimal and Approximate Q-value Functions for Decentralized POMDPs , 2008, J. Artif. Intell. Res..

[42]  Nikos A. Vlassis,et al.  Q-value functions for decentralized POMDPs , 2007, AAMAS '07.

[43]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[44]  Nils J. Nilsson,et al.  Artificial Intelligence , 1974, IFIP Congress.

[45]  Shimon Whiteson,et al.  Lossless clustering of histories in decentralized POMDPs , 2009, AAMAS.

[46]  Nikos A. Vlassis,et al.  Decentralized planning under uncertainty for teams of communicating agents , 2006, AAMAS '06.

[47]  François Charpillet,et al.  Improving coordination with communication in multi-agent reinforcement learning , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[48]  Manuela M. Veloso,et al.  Exploiting factored representations for decentralized execution in multiagent teams , 2007, AAMAS '07.

[49]  Victor R. Lesser,et al.  Decentralized Markov decision processes with event-driven interactions , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[50]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[51]  Abdel-Illah Mouaddib,et al.  A polynomial algorithm for decentralized Markov decision processes with temporal constraints , 2005, AAMAS '05.

[52]  Kee-Eung Kim,et al.  Learning to Cooperate via Policy Search , 2000, UAI.

[53]  Claudia V. Goldman,et al.  Decentralized Control of Cooperative Systems: Categorization and Complexity Analysis , 2004, J. Artif. Intell. Res..

[54]  Leslie Pack Kaelbling,et al.  Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.

[55]  François Charpillet,et al.  Mixed Integer Linear Programming for Exact Finite-Horizon Planning in Decentralized Pomdps , 2007, ICAPS.

[56]  Makoto Yokoo,et al.  Networked Distributed POMDPs: A Synergy of Distributed Constraint Optimization and POMDPs , 2005, IJCAI.

[57]  Manuela Veloso,et al.  What to Communicate? Execution-Time Decision in Multi-agent POMDPs , 2006, DARS.

[58]  Shimon Whiteson,et al.  Exploiting locality of interaction in factored Dec-POMDPs , 2008, AAMAS.

[59]  Craig Boutilier,et al.  Planning, Learning and Coordination in Multiagent Decision Processes , 1996, TARK.

[60]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[61]  Ian F. Akyildiz,et al.  Wireless sensor networks: a survey , 2002, Comput. Networks.

[62]  Victor R. Lesser,et al.  Minimizing communication cost in a distributed Bayesian network using a decentralized MDP , 2003, AAMAS '03.