Commitment Without Regrets: Online Learning in Stackelberg Security Games

In a Stackelberg Security Game, a defender commits to a randomized deployment of security resources, and an attacker best-responds by attacking a target that maximizes his utility. While algorithms for computing an optimal strategy for the defender to commit to have had a striking real-world impact, deployed applications require significant information about potential attackers, leading to inefficiencies. We address this problem via an online learning approach. We are interested in algorithms that prescribe a randomized strategy for the defender at each step against an adversarially chosen sequence of attackers, and obtain feedback on their choices (observing either the current attacker type or merely which target was attacked). We design no-regret algorithms whose regret (when compared to the best fixed strategy in hindsight) is polynomial in the parameters of the game, and sublinear in the number of times steps.

[1]  Ariel D. Procaccia,et al.  Learning Optimal Commitment to Overcome Insecurity , 2014, NIPS.

[2]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[3]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[4]  Y. Mansour,et al.  Algorithmic Game Theory: Learning, Regret Minimization, and Equilibria , 2007 .

[5]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[6]  Gerald Tesauro,et al.  Playing repeated Stackelberg games with unknown opponents , 2012, AAMAS.

[7]  Yishay Mansour,et al.  Improved second-order bounds for prediction with expert advice , 2006, Machine Learning.

[8]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[9]  Milind Tambe,et al.  Monotonic Maximin: A Robust Stackelberg Solution against Boundedly Rational Followers , 2013, GameSec.

[10]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[11]  Bo An,et al.  Security Games with Limited Surveillance , 2012, AAAI.

[12]  Milind Tambe,et al.  Approximation methods for infinite Bayesian Stackelberg games: modeling distributional payoff uncertainty , 2011, AAMAS.

[13]  Sarit Kraus,et al.  Robust solutions to Stackelberg games: Addressing bounded rationality and limited observations in human cognition , 2010, Artif. Intell..

[14]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[15]  Nicolò Cesa-Bianchi,et al.  Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[16]  Vincent Conitzer,et al.  Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[17]  Ariel D. Procaccia,et al.  Lazy Defenders Are Almost Optimal against Diligent Attackers , 2014, AAAI.

[18]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[19]  Milind Tambe,et al.  Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[20]  Vincent Conitzer,et al.  Complexity of Computing Optimal Stackelberg Strategies in Security Resource Allocation Games , 2010, AAAI.

[21]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[22]  Rong Yang,et al.  Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[23]  Manish Jain,et al.  Game theory for security: Key algorithmic principles, deployed systems, lessons learned , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Baruch Awerbuch,et al.  Adapting to a reliable network path , 2003, PODC '03.