论文信息 - End-to-End Game-Focused Learning of Adversary Behavior in Security Games - 字舞流文

End-to-End Game-Focused Learning of Adversary Behavior in Security Games

Stackelberg security games are a critical tool for maximizing the utility of limited defense resources to protect important targets from an intelligent adversary. Motivated by green security, where the defender may only observe an adversary's response to defense on a limited set of targets, we study the problem of learning a defense that generalizes well to a new set of targets with novel feature values and combinations. Traditionally, this problem has been addressed via a two-stage approach where an adversary model is trained to maximize predictive accuracy without considering the defender's optimization problem. We develop an end-to-end game-focused approach, where the adversary model is trained to maximize a surrogate for the defender's expected utility. We show both in theory and experimental results that our game-focused approach achieves higher defender expected utility than the two-stage alternative when there is limited data.

Milind Tambe | Bryan Wilder | Andrew Perrault | Bistra Dilkina | Aditya Mate | Eric Ewing

[1] Branislav Bosanský,et al. Comparing Strategic Secrecy and Stackelberg Commitment in Security Games , 2017, IJCAI.

[2] Rong Yang,et al. Improving Resource Allocation Strategy against Human Adversaries in Security Games , 2011, IJCAI.

[3] Kevin Leyton-Brown,et al. Predicting human behavior in unrepeated, simultaneous-move games , 2013, Games Econ. Behav..

[4] Milind Tambe,et al. When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing , 2015, IJCAI.

[5] James Bailey,et al. Predict+Optimise with Ranking Objectives: Exhaustively Learning Linear Functions , 2019, IJCAI.

[6] Nicola Basilico,et al. Adversarial patrolling with spatially uncertain alarm signals , 2015, Artif. Intell..

[7] Sarit Kraus,et al. Robust solutions to Stackelberg games: Addressing bounded rationality and limited observations in human cognition , 2010, Artif. Intell..

[8] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[9] Branislav Bosanský,et al. Using Correlated Strategies for Computing Stackelberg Equilibria in Extensive-Form Games , 2016, AAAI.

[10] Milind Tambe,et al. Melding the Data-Decisions Pipeline: Decision-Focused Learning for Combinatorial Optimization , 2018, AAAI.

[11] Rong Yang,et al. Adaptive resource allocation for wildlife protection against illegal poachers , 2014, AAMAS.

[12] Vincent Conitzer,et al. Learning and Approximating the Optimal Strategy to Commit To , 2009, SAGT.

[13] J. Zico Kolter,et al. Large Scale Learning of Agent Rationality in Two-Player Zero-Sum Games , 2019, AAAI.

[14] J. Zico Kolter,et al. OptNet: Differentiable Optimization as a Layer in Neural Networks , 2017, ICML.

[15] R. McKelvey,et al. Quantal Response Equilibria for Normal Form Games , 1995 .

[16] Milind Tambe,et al. Comparing human behavior models in repeated Stackelberg security games: An extended study , 2016, Artif. Intell..

[17] Ariel D. Procaccia,et al. Learning to Play Stackelberg Security Games , 2015 .

[18] Amos Azaria,et al. Analyzing the Effectiveness of Adversary Modeling in Security Games , 2013, AAAI.

[19] Juliane Hahn,et al. Security And Game Theory Algorithms Deployed Systems Lessons Learned , 2016 .

[20] Milind Tambe,et al. Keeping Pace with Criminals: Designing Patrol Allocation Against Adaptive Opportunistic Criminals , 2015, AAMAS.

[21] Priya L. Donti,et al. Task-based End-to-end Model Learning in Stochastic Optimization , 2017, NIPS.

[22] Nicole D. Sintov,et al. Adversaries Wising Up : Modeling Heterogeneity and Dynamics of Behavior , 2016 .

[23] Yoshua Bengio,et al. Using a Financial Training Criterion Rather than a Prediction Criterion , 1997, Int. J. Neural Syst..

[24] Milind Tambe,et al. Learning Adversary Behavior in Security Games: A PAC Model Perspective , 2015, AAMAS.

[25] Haifeng Xu,et al. The Mysteries of Security Games: Equilibrium Computation Becomes Combinatorial Algorithm Design , 2016, EC.

[26] Vincent Conitzer,et al. Solving Stackelberg games with uncertain observability , 2011, AAMAS.

[27] Milind Tambe,et al. Adversary Models Account for Imperfect Crime Data: Forecasting and Planning against Real-world Poachers , 2018, AAMAS.

[28] Nicole D. Sintov,et al. Human Adversaries in Opportunistic Crime Security Games: Evaluating Competing Bounded Rationality Models , 2015 .

[29] Steven Okamoto,et al. Solving non-zero sum multiagent network flow security games with attack costs , 2012, AAMAS.

[30] Bo An,et al. Security Games with Protection Externalities , 2015, AAAI.

[31] J. Zico Kolter,et al. What game are we playing? End-to-end learning in normal and extensive form games , 2018, IJCAI.

[32] Rong Yang,et al. Improving resource allocation strategies against human adversaries in security games: An extended study , 2013, Artif. Intell..

[33] Milind Tambe,et al. Three Strategies to Success: Learning Adversary Models in Security Games , 2016, IJCAI.

[34] Bo An,et al. Game-Theoretic Resource Allocation for Protecting Large Public Events , 2014, AAAI.

[35] Richard S. John,et al. Empirical Comparisons of Descriptive Multi-objective Adversary Models in Stackelberg Security Games , 2014, GameSec.

[36] Nicholas R. Jennings,et al. Playing Repeated Security Games with No Prior Knowledge , 2016, AAMAS.

[37] Nicola Basilico,et al. Patrolling security games: Definition and algorithms for solving large instances with single patroller and single intruder , 2012, Artif. Intell..

[38] Kevin Leyton-Brown,et al. Deep Learning for Predicting Human Strategic Behavior , 2016, NIPS.