Coordinating Randomized Policies for Increasing Security in Multiagent Systems

Despite significant recent advances in decision theoretic frameworks for reasoning about multiagent teams, little attention has been paid to applying such frameworks in adversarial domains, where the agent team may face security threats from other agents. This paper focuses on domains where such threats are caused by unseen adversaries whose actions or payoffs are unknown. In such domains, action randomization is recognized as a key technique to deteriorate an adversary's capability to predict and exploit an agent/agent team's actions. Unfortunately, there are two key challenges in such randomization. First, randomization can reduce the expected reward (quality) of the agent team's plans, and thus we must provide some guarantees on such rewards. Second, randomization results in miscoordination in teams. While communication within an agent team can help in alleviating the miscoordination problem, communication is unavailable in many real domains or sometimes scarcely available. To address these challenges, this paper provides the following contributions. First, we recall the Multiagent Constrained MDP (MCMDP) framework that enables policy generation for a team of agents where each agent may have a limited or no(communication) resource. Second, since randomized policies generated directly for MCMDPs lead to miscoordination, we introduce a transformation algorithm that converts the MCMDP into a transformed MCMDP incorporating explicit communication and no communication actions. Third, we show that incorporating randomization results in a non-linear program and the unavailability/limited availability of communication results in addition of non-convex constraints to the non-linear program. Finally, we experimentally illustrate the benefits of our work.

[1]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[2]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[3]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[4]  Gaurav S. Sukhatme,et al.  Studying the feasibility of energy harvesting in a mobile sensor network , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[5]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[6]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[7]  Victor R. Lesser,et al.  Analyzing Myopic Approaches for Multi-Agent Communication , 2005, IAT.

[8]  Alice M. Mulvehill,et al.  An approach to mixed-initiative management of heterogeneous software agent teams , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[9]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[10]  Sridhar Mahadevan,et al.  Learning to Communicate and Act in Cooperative Multiagent Systems using Hierarchical Reinforcement Learning , 2004 .

[11]  Paul J. Lewis,et al.  Applications suitable for unmanned and autonomous missions utilizing the Tactical Amphibious Ground Support (TAGS) platform , 2004, SPIE Defense + Commercial Sensing.

[12]  Andrei Serjantov,et al.  On the anonymity of anonymity systems , 2004 .

[13]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[14]  Sarit Kraus,et al.  Security in multiagent systems by policy randomization , 2006, AAMAS '06.

[15]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[16]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[17]  Timothy W. McLain,et al.  Multiple UAV cooperative search under collision avoidance and limited range communication constraints , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[18]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[19]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[20]  Thomas Denewiler,et al.  Unmanned ground vehicles for integrated force protection , 2004, SPIE Defense + Commercial Sensing.

[21]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[22]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[23]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[24]  Edmund H. Durfee,et al.  Approximating Optimal Policies for Agents with Limited Execution Resources , 2003, IJCAI.

[25]  E. Altman Constrained Markov Decision Processes , 1999 .

[26]  Sarit Kraus,et al.  Towards a formalization of teamwork with resource constraints , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[27]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.