Coordinating randomized policies for increasing security of agent systems

We consider the problem of providing decision support to a patrolling or security service in an adversarial domain. The idea is to create patrols that can achieve a high level of coverage or reward while taking into account the presence of an adversary. We assume that the adversary can learn or observe the patrolling strategy and use this to its advantage. We follow two different approaches depending on what is known about the adversary. If there is no information about the adversary we use a Markov Decision Process (MDP) to represent patrols and identify randomized solutions that minimize the information available to the adversary. This lead to the development of algorithms CRLP and BRLP, for policy randomization of MDPs. Second, when there is partial information about the adversary we decide on efficient patrols by solving a Bayesian–Stackelberg games. Here, the leader decides first on a patrolling strategy and then an adversary, of possibly many adversary types, selects its best response for the given patrol. We provide two efficient MIP formulations named DOBSS and ASAP to solve this NP-hard problem. Our experimental results show the efficiency of these algorithms and illustrate how these techniques provide optimal and secure patrolling policies. We note that these models have been applied in practice, with DOBSS being at the heart of the ARMOR system that is currently deployed at the Los Angeles International airport (LAX) for randomizing checkpoints on the roadways entering the airport and canine patrol routes within the airport terminals.

[1]  Victor R. Lesser,et al.  Multi-agent policies: from centralized ones to decentralized ones , 2002, AAMAS '02.

[2]  Stefan Arnborg,et al.  Bayesian Games for Threat Prediction and Situation Analysis , 2004 .

[3]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS 2008.

[4]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[5]  Andrei Serjantov,et al.  On the anonymity of anonymity systems , 2004 .

[6]  Vincent Conitzer,et al.  Mixed-Integer Programming Methods for Finding Nash Equilibria , 2005, AAAI.

[7]  Craig Boutilier,et al.  Bounded Finite State Controllers , 2003, NIPS.

[8]  Craig Boutilier,et al.  Sequential Optimality and Coordination in Multiagent Systems , 1999, IJCAI.

[9]  Sarit Kraus,et al.  Security in multiagent systems by policy randomization , 2006, AAMAS '06.

[10]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[11]  Robert W Poole,et al.  A Risk-based Airport Security Policy , 2003 .

[12]  Wade Trappe,et al.  Source-location privacy in energy-constrained sensor network routing , 2004, SASN '04.

[13]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[14]  Nikita Borisov,et al.  Anonymity in Structured Peer-to-Peer Networks , 2003 .

[15]  Tim Roughgarden Stackelberg Scheduling Strategies , 2004, SIAM J. Comput..

[16]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[17]  Sridhar Mahadevan,et al.  Learning to Communicate and Act in Cooperative Multiagent Systems using Hierarchical Reinforcement Learning , 2004 .

[18]  Edmund H. Durfee,et al.  Optimal Resource Allocation and Policy Formulation in Loosely-Coupled Markov Decision Processes , 2004, ICAPS.

[19]  Timothy W. McLain,et al.  Multiple UAV cooperative search under collision avoidance and limited range communication constraints , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[20]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[21]  Thomas Denewiler,et al.  Unmanned ground vehicles for integrated force protection , 2004, SPIE Defense + Commercial Sensing.

[22]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[23]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[24]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[27]  Makoto Yokoo,et al.  Communications for improving policy computation in distributed POMDPs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[28]  Alice M. Mulvehill,et al.  An approach to mixed-initiative management of heterogeneous software agent teams , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[29]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[30]  Sarit Kraus,et al.  Towards a formalization of teamwork with resource constraints , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[31]  Edmund H. Durfee,et al.  Constructing optimal policies for agents with constrained architectures , 2003, AAMAS '03.

[32]  Sarit Kraus,et al.  An efficient heuristic approach for security against multiple adversaries , 2007, AAMAS '07.

[33]  Shlomo Zilberstein,et al.  Bounded Policy Iteration for Decentralized POMDPs , 2005, IJCAI.

[34]  Drew Fudenberg,et al.  Game theory (3. pr.) , 1991 .

[35]  Sridhar Mahadevan,et al.  Learning to communicate and act using hierarchical reinforcement learning , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[36]  R. Selten,et al.  A Generalized Nash Solution for Two-Person Bargaining Games with Incomplete Information , 1972 .

[37]  Gerald G. Brown,et al.  Defending Critical Infrastructure , 2006, Interfaces.

[38]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[39]  Paul J. Lewis,et al.  Applications suitable for unmanned and autonomous missions utilizing the Tactical Amphibious Ground Support (TAGS) platform , 2004, SPIE Defense + Commercial Sensing.

[40]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[41]  Gaurav S. Sukhatme,et al.  Studying the feasibility of energy harvesting in a mobile sensor network , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[42]  Edmund H. Durfee,et al.  Approximating Optimal Policies for Agents with Limited Execution Resources , 2003, IJCAI.

[43]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[44]  Chinh Nguyen,et al.  Development and testing for physical security robots , 2005, SPIE Defense + Commercial Sensing.

[45]  Victor R. Lesser,et al.  Analyzing Myopic Approaches for Multi-Agent Communication , 2005, IAT.