Keep the Adversary Guessing: Agent Security by Policy Randomization

Recent advances in the field of agent/multiagent systems brings us closer to agents acting in real world domains, which can be uncertain and many times adversarial. Security, commonly defined as the ability to deal with intentional threats from other agents is a major challenge for agents or agent-teams deployed in these adversarial domains. Such adversarial scenarios arise in a wide variety of situations that are becoming increasingly important such as agents patrolling to provide perimeter security around critical infrastructure or performing routine security checks. These domains have the following characteristics: (a) The agent or agent-team needs to commit to a security policy while the adversaries may observe and exploit the policy committed to. (b) The agent/agent-team potentially faces different types of adversaries and has varying information available about the adversaries (thus limiting the agents' ability to model its adversaries). To address security in such domains, I developed two types of algorithms. First, when the agent has no model of its adversaries, my key idea is to randomize agent's policies to minimize the information gained by adversaries. To that end, I developed algorithms for policy randomization for both the Markov Decision Processes (MDPs) and the Decentralized-Partially Observable MDPs (Dec POMDPs). Since arbitrary randomization can violate quality constraints (for example, the resource usage should be below a certain threshold or key areas must be patrolled with a certain frequency), my algorithms guarantee quality constraints on the randomized policies generated. For efficiency, I provide a novel linear program for randomized policy generation in MDPs, and then build on this program for a heuristic solution for Dec-POMDPs. Second, when the agent has partial model of the adversaries, I model the security domain as a Bayesian Stackelberg game where the agent's model of the adversary includes a probability distribution over possible adversary types. While the optimal policy selection for a Bayesian Stackelberg game is known to be NP-hard, my solution approach based on an efficient Mixed Integer Linear Program (MILP) provides significant speedups over existing approaches while obtaining the optimal solution. The resulting policy randomizes the agent's possible strategies, while taking into account the probability distribution over adversary types. Finally, I provide experimental results for all my algorithms, illustrating the new techniques developed have enabled us to find optimal secure policies efficiently for an increasingly important class of security domains.

[1]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[2]  Spiros Kapetanakis,et al.  Between collaboration and competition : An Initial Formalization using Distributed POMDPs , 2007 .

[3]  Gerald G. Brown,et al.  Defending Critical Infrastructure , 2006, Interfaces.

[4]  Sarit Kraus,et al.  ARMOR Security for Los Angeles International Airport , 2008, AAAI.

[5]  Timothy W. McLain,et al.  Multiple UAV cooperative search under collision avoidance and limited range communication constraints , 2003, 42nd IEEE International Conference on Decision and Control (IEEE Cat. No.03CH37475).

[6]  Sarit Kraus,et al.  Deployed ARMOR protection: the application of a game theoretic model for security at the Los Angeles International Airport , 2008, AAMAS.

[7]  Sarit Kraus,et al.  Efficient Algorithms to Solve Bayesian Stackelberg Games for Security Applications , 2008, AAAI.

[8]  Vincent Conitzer,et al.  Mixed-Integer Programming Methods for Finding Nash Equilibria , 2005, AAAI.

[9]  Milind Tambe,et al.  Conflicts in teamwork: hybrids to the rescue , 2005, AAMAS '05.

[10]  Tim Roughgarden Stackelberg Scheduling Strategies , 2004, SIAM J. Comput..

[11]  Wade Trappe,et al.  Source-location privacy in energy-constrained sensor network routing , 2004, SASN '04.

[12]  Laurent El Ghaoui,et al.  Robustness in Markov Decision Problems with Uncertain Transition Matrices , 2003, NIPS.

[13]  Sarit Kraus,et al.  Robust Solutions in Stackelberg Games : Addressing Boundedly Rational Human Preference Models , 2008 .

[14]  Leslie Pack Kaelbling,et al.  Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[15]  S. Vavasis Nonlinear optimization: complexity issues , 1991 .

[16]  Alexis Drogoul,et al.  Multi-agent Patrolling: An Empirical Analysis of Alternative Architectures , 2002, MABS.

[17]  Bernhard von Stengel,et al.  Exponentially many steps for finding a Nash equilibrium in a bimatrix game , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[18]  Michael L. Littman,et al.  Incremental Pruning: A Simple, Fast, Exact Method for Partially Observable Markov Decision Processes , 1997, UAI.

[19]  John N. Tsitsiklis,et al.  Introduction to linear optimization , 1997, Athena scientific optimization and computation series.

[20]  G. Watts,et al.  Journals , 1881, The Lancet.

[21]  Brahim Chaib-draa,et al.  An online POMDP algorithm for complex multiagent environments , 2005, AAMAS '05.

[22]  Paul Scerri,et al.  Self-Organized Criticality of Belief Propagation in Large Heterogeneous Teams , 2010 .

[23]  Prasant Mohapatra,et al.  Virtual patrol: a new power conservation design for surveillance using sensor networks , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[24]  C. E. Lemke,et al.  Equilibrium Points of Bimatrix Games , 1964 .

[25]  Aranyak Mehta,et al.  Playing large games using simple strategies , 2003, EC '03.

[26]  Roie Zivan,et al.  POMDP based Negotiation Modeling , 2009 .

[27]  Christodoulos A. Floudas,et al.  Deterministic global optimization - theory, methods and applications , 2010, Nonconvex optimization and its applications.

[28]  Jennifer Golbeck,et al.  Cultural Modeling in a Game Theoretic Framework , 2008, AAAI Fall Symposium: Adaptive Agents in Cultural Contexts.

[29]  Robert W Poole,et al.  A Risk-based Airport Security Policy , 2003 .

[30]  Stefan Arnborg,et al.  Bayesian Games for Threat Prediction and Situation Analysis , 2004 .

[31]  Jeff G. Schneider,et al.  Approximate solutions for partially observable stochastic games with common payoffs , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[32]  Claudia V. Goldman,et al.  Transition-independent decentralized markov decision processes , 2003, AAMAS '03.

[33]  Milind Tambe,et al.  The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models , 2011, J. Artif. Intell. Res..

[34]  Sarit Kraus,et al.  Coordinating randomized policies for increasing security of agent systems , 2009, Inf. Technol. Manag..

[35]  Paul Scerri,et al.  Analyzing the impact of human bias on human-agent teams in resource allocation domains , 2010, AAMAS.

[36]  J. Nash Equilibrium Points in N-Person Games. , 1950, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Chinh Nguyen,et al.  Development and testing for physical security robots , 2005, SPIE Defense + Commercial Sensing.

[38]  Avi Pfeffer,et al.  Representations and Solutions for Game-Theoretic Problems , 1997, Artif. Intell..

[39]  Nikita Borisov,et al.  Anonymity in Structured Peer-to-Peer Networks , 2003 .

[40]  R. Selten,et al.  A Generalized Nash Solution for Two-Person Bargaining Games with Incomplete Information , 1972 .

[41]  Andrew McLennan,et al.  Gambit: Software Tools for Game Theory , 2006 .

[42]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[43]  Kamalakar Karlapalem,et al.  Multi agent simulation of unorganized traffic , 2002, AAMAS '02.

[44]  Paul Scerri,et al.  Effect of Humans on Belief Propagation in Large Heterogeneous Teams , 2010 .

[45]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[46]  Paul Seidenstat,et al.  Protecting Airline Passengers in the Age of Terrorism , 2009 .

[47]  Avi Pfeffer,et al.  Generating and Solving Imperfect Information Games , 1995, IJCAI.

[48]  Michael I. Jordan,et al.  Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[49]  S. Vavasis COMPLEXITY ISSUES IN GLOBAL OPTIMIZATION: A SURVEY , 1995 .

[50]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[51]  Peter L. Bartlett,et al.  Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..

[52]  B. Stengel,et al.  Leadership with commitment to mixed strategies , 2004 .

[53]  Paul J. Lewis,et al.  Applications suitable for unmanned and autonomous missions utilizing the Tactical Amphibious Ground Support (TAGS) platform , 2004, SPIE Defense + Commercial Sensing.

[54]  Claudia V. Goldman,et al.  Optimizing information exchange in cooperative multi-agent systems , 2003, AAMAS '03.

[55]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[56]  Stuart J. Russell,et al.  Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.

[57]  Edmund H. Durfee,et al.  Constructing optimal policies for agents with constrained architectures , 2003, AAMAS '03.

[58]  Sarit Kraus,et al.  Algorithms for secure patrols in adversarial domains , 2007 .

[59]  Sarit Kraus,et al.  Towards a formalization of teamwork with resource constraints , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..

[60]  Sarit Kraus,et al.  Coordinating Randomized Policies for Increasing Security in Multiagent Systems , 2009, Safety and Security in Multiagent Systems.

[61]  Sarit Kraus,et al.  Security in multiagent systems by policy randomization , 2006, AAMAS '06.

[62]  Sarit Kraus,et al.  Bayesian stackelberg games and their application for security at Los Angeles international airport , 2008, SECO.

[63]  Hector Muñoz-Avila,et al.  RETALIATE: Learning Winning Policies in First-Person Shooter Games , 2007, AAAI.

[64]  Suresh Jagannathan,et al.  Randomized leader election , 2007, Distributed Computing.

[65]  Martin L. Puterman,et al.  Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .

[66]  Makoto Yokoo,et al.  Taming Decentralized POMDPs: Towards Efficient Policy Computation for Multiagent Settings , 2003, IJCAI.

[67]  Sarit Kraus,et al.  Using Game Theory for Los Angeles Airport Security , 2009, AI Mag..

[68]  Milind Tambe,et al.  Increasing Security through Communication and Policy Randomization in Multiagent Systems , 2006 .

[69]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[70]  Sarit Kraus,et al.  An Efficient Heuristic for Security against Multiple Adversaries in Stackelberg Games , 2007, AAAI Spring Symposium: Game Theoretic and Decision Theoretic Agents.

[71]  Yann Chevaleyre,et al.  Theoretical analysis of the multi-agent patrolling problem , 2004, Proceedings. IEEE/WIC/ACM International Conference on Intelligent Agent Technology, 2004. (IAT 2004)..

[72]  Sarit Kraus,et al.  An efficient heuristic approach for security against multiple adversaries , 2007, AAMAS '07.

[73]  Sui Ruan,et al.  Patrolling in a Stochastic Environment , 2005 .

[74]  Yoav Shoham,et al.  Simple search methods for finding a Nash equilibrium , 2004, Games Econ. Behav..

[75]  D. Huffman A Method for the Construction of Minimum-Redundancy Codes , 1952 .

[76]  Sieuwert van Otterloo,et al.  The value of privacy: optimal strategies for privacy minded agents , 2005, AAMAS '05.

[77]  Michael P. Wellman,et al.  Computing approximate bayes-nash equilibria in tree-games of incomplete information , 2004, EC '04.

[78]  Rajeev Motwani,et al.  Randomized algorithms , 1996, CSUR.

[79]  Edmund H. Durfee,et al.  Approximating Optimal Policies for Agents with Limited Execution Resources , 2003, IJCAI.

[80]  D. Koller,et al.  Efficient Computation of Equilibria for Extensive Two-Person Games , 1996 .

[81]  Yann Chevaleyre,et al.  A theoretical analysis of multi-agent patrolling strategies , 2004, Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems, 2004. AAMAS 2004..