论文信息 - Computing Stackelberg Equilibria in Discounted Stochastic Games ( Corrected Version )

Computing Stackelberg Equilibria in Discounted Stochastic Games ( Corrected Version )

Stackelberg games increasingly influence security policies deployed in real-world settings. Much of the work to date focuses on devising a fixed randomized strategy for the defender, accounting for an attacker who optimally responds to it. In practice, defense policies are often subject to constraints and vary over time, allowing an attacker to infer characteristics of future policies based on current observations. A defender must therefore account for an attacker’s observation capabilities in devising a security policy. We show that this general modeling framework can be captured using stochastic Stackelberg games (SSGs), where a defender commits to a dynamic policy to which the attacker devises an optimal dynamic response. We then offer the following contributions. 1) We show that Markov stationary policies do not suffice in SSGs, except in several very special cases; 2) present a finite-time mixed-integer nonlinear program for computing a Stackelberg equilibrium in SSGs when the leader is restricted to Markov stationary policies, and 3) present a mixed-integer linear program to approximate it. 4) We illustrate our algorithms on a simple SSG representing an adversarial patrolling scenario, where we study the impact of attacker patience and risk aversion on optimal defense policies. Introduction Recent work using Stackelberg games to model security problems in which a defender deploys resources to protect targets from an attacker has proven very successful both in yielding algorithmic advances (Conitzer and Sandholm 2006; Paruchuri et al. 2008; Kiekintveld et al. 2009; Jain et al. 2010a) and in field applications (Jain et al. 2010b; An et al. 2011). The solution to these games are Stackelberg Equilibria, or SE, in which the attacker is assumed to know the defender’s mixed strategy and plays a best response to it (breaking ties in favor of the defender makes it a Strong SE, or SSE). The defender’s task is to pick an optimal (usually mixed) strategy given that the attacker is going to play a best-response to it. This ability of the attacker to know the defender’s strategy in SE is motivated in security problems by the fact that the attacker can take advantage of surveillance prior to the actual attack. The simCopyright c © 2012, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. plest Stackelberg games are single-shot zero-sum games. These assumptions keep the computational complexity of finding solutions manageable but limit applicability. In this paper we approach the problem from the other extreme of generality by addressing SSE computation in general-sum discounted stochastic Stackelberg games (SSGs). Our main contributions are: 1) showing that there need not exist SSE in Markov stationary strategies, 2) providing a finite-time general MINLP (mixed-integer nonlinear program) for computing SSE when the leader is restricted to Markov stationary policies, 3) providing an MILP (mixed-integer linear program) for computing approximate SSE in Markov stationary policies with provable approximation bounds, and 4) a demonstration that the generality of SSGs allows us to obtain qualitative insights about security settings for which no alternative techniques exist. Notation and Preliminaries We consider two-player infinite-horizon discounted stochastic Stackelberg games (SSGs from now on) in which one player is a “leader” and the other a “follower”. The leader commits to a policy that becomes known to the follower who plays a best-response policy. These games have a finite state space S, finite action spaces AL for the leader and AF for the follower, payoff functions RL(s, al, af ) and RF (s, al, af ) for leader and follower respectively, and a transition function T alaf ss′ , where s, s′ ∈ S, al ∈ AL and af ∈ AF . The discount factors are γL, γF < 1 for the leader and follower, respectively. Finally, β(s) is the probability that the initial state is s. The history of play at time t is h(t) = {s(1)al(1)af (1) . . . s(t − 1)al(t − 1)af (t − 1)s(t)} where the parenthesized indices denote time. Let Π (Φ) be the set of unconstrained, i.e., nonstationary and nonMarkov, policies for the leader (follower), i.e., mappings from histories to distributions over actions. Similarly, let ΠMS (ΦMS) be the set of Markov stationary policies for the leader (follower); these map the last state s(t) to distributions over actions. Finally, for the follower we will also need the set of deterministic Markov stationary policies, denoted ΦdMS . Let UL and UF denote the utility functions for leader and follower respectively. For arbitrary policies π ∈ Π and φ ∈ 1478 Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence

Satinder Singh | Y. Vorobeychik

[1] Garth P. McCormick,et al. Computability of global solutions to factorable nonconvex programs: Part I — Convex underestimating problems , 1976, Math. Program..

[2] J. Filar,et al. Competitive Markov Decision Processes , 1996 .

[3] C. Gollier. The economics of risk and time , 2001 .

[4] Vincent Conitzer,et al. Computing the optimal strategy to commit to , 2006, EC '06.

[5] Sarit Kraus,et al. Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[6] Sarit Kraus,et al. Multi-robot perimeter patrol in adversarial settings , 2008, 2008 IEEE International Conference on Robotics and Automation.

[7] Manish Jain,et al. Computing optimal randomized resource allocations for massive security games , 2009, AAMAS 2009.

[8] Nicola Basilico,et al. Leader-follower strategies for robotic patrolling in environments with arbitrary topologies , 2009, AAMAS.

[9] Manish Jain,et al. Software Assistants for Randomized Patrol Planning for the LAX Airport Police and the Federal Air Marshal Service , 2010, Interfaces.

[10] Tuomas Sandholm,et al. Computing equilibria by incorporating qualitative models? , 2010, AAMAS.

[11] Nicola Basilico,et al. Asynchronous Multi-Robot Patrolling against Intrusions in Arbitrary Topologies , 2010, AAAI.

[12] Manish Jain,et al. Security Games with Arbitrary Schedules: A Branch and Price Approach , 2010, AAAI.

[13] Nicola Basilico,et al. A Game-Theoretical Model Applied to an Active Patrolling Camera , 2010, 2010 International Conference on Emerging Security Technologies.

[14] Noa Agmon,et al. Multiagent Patrol Generalized to Complex Environmental Conditions , 2011, AAAI.

[15] Nicola Basilico,et al. Automated Abstractions for Patrolling Security Games , 2011, AAAI.

[16] Branislav Bosanský,et al. Computing time-dependent policies for patrolling games with mobile targets , 2011, AAMAS.

[17] Bo An,et al. GUARDS and PROTECT: next generation applications of security games , 2011, SECO.

[18] Bo An,et al. Adversarial patrolling games , 2012, AAMAS.