Counterfactual regret minimization for integrated cyber and air defense resource allocation

Abstract This research presents a new application of optimal and approximate solution techniques to solve resource allocation problems with imperfect information in the cyber and air-defense domains. We develop a two-player, zero-sum, extensive-form game to model attacker and defender roles in both physical and cyber space. We reformulate the problem to find a Nash equilibrium using an efficient, sequence-form linear program. Solving this linear program produces optimal defender strategies for the multi-domain security game. We address large problem instances with an application of the approximate counterfactual regret minimization algorithm. This approximation reduces computation time by 95% while maintaining an optimality gap of less than 3%. Our application of discounted counterfactual regret results in a further 36% reduction in computation time from the base algorithm. We develop domain insights through a designed experiment to explore the parameter space of the problem and algorithm. We also address robust opponent exploitation by combining existing techniques to extend the counterfactual regret algorithm to include a discounted, constrained variant. A comparison of robust linear programming, data-biased response, and constrained counterfactual regret approaches clarifies trade-offs between exploitation and exploitability for each method. The robust linear programming approach is the most effective, producing an exploitation to exploitability ratio of 10.8 to 1.

[1]  Vincent Conitzer,et al.  Security scheduling for real-world networks , 2013, AAMAS.

[2]  Milind Tambe,et al.  Robust Protection of Fisheries with COmPASS , 2014, AAAI.

[3]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[4]  Demosthenis Teneketzis,et al.  A POMDP Approach to the Dynamic Defense of Large-Scale Cyber Networks , 2018, IEEE Transactions on Information Forensics and Security.

[5]  Vincent Conitzer,et al.  Stackelberg vs. Nash in Security Games: An Extended Investigation of Interchangeability, Equivalence, and Uniqueness , 2011, J. Artif. Intell. Res..

[6]  Milind Tambe,et al.  Stop the compartmentalization: unified robust algorithms for handling uncertainties in security games , 2014, AAMAS.

[7]  Steve Alpern,et al.  Patrolling Games , 2011, Oper. Res..

[8]  Darryl K. Ahner,et al.  Optimal multi-stage allocation of weapons to targets using adaptive dynamic programming , 2015, Optim. Lett..

[9]  Erim Kardes On discounted stochastic games with incomplete information on payoffs and a security application , 2014, Oper. Res. Lett..

[10]  Darryl K. Ahner,et al.  The Weapon-Target Assignment Problem , 2019, Comput. Oper. Res..

[11]  Brian J. Lunday,et al.  Heterogeneous surface-to-air missile defense battery location: a game theoretic approach , 2017, J. Heuristics.

[12]  Yoav Shoham,et al.  Multiagent Systems - Algorithmic, Game-Theoretic, and Logical Foundations , 2009 .

[13]  Branislav Bosanský,et al.  An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information , 2014, J. Artif. Intell. Res..

[14]  Michael H. Bowling,et al.  Data Biased Robust Counter Strategies , 2009, AISTATS.

[15]  Brian J. Lunday,et al.  A Game Theoretic Model for the Optimal Location of Integrated Air Defense System Missile Batteries , 2016, INFORMS J. Comput..

[16]  Yongchao Liu,et al.  Distributionally robust equilibrium for continuous games: Nash and Stackelberg models , 2018, Eur. J. Oper. Res..

[17]  Yurii Nesterov,et al.  Excessive Gap Technique in Nonsmooth Convex Minimization , 2005, SIAM J. Optim..

[18]  Michael H. Bowling,et al.  Computing Robust Counter-Strategies , 2007, NIPS.

[19]  Hervé Debar,et al.  A Survey on Game-Theoretic Approaches for Intrusion Detection and Response Optimization , 2018, ACM Comput. Surv..

[20]  Dimitris Bertsimas,et al.  Robust game theory , 2006, Math. Program..

[21]  Michael H. Bowling,et al.  Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization , 2012, AAMAS.

[22]  Michael H. Bowling,et al.  Solving Heads-Up Limit Texas Hold'em , 2015, IJCAI.

[23]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[24]  Warren B. Powell,et al.  “Approximate dynamic programming: Solving the curses of dimensionality” by Warren B. Powell , 2007, Wiley Series in Probability and Statistics.

[25]  Michael N. Gagnon,et al.  Towards Net-Centric Cyber Survivability for Ballistic Missile Defense , 2010, ISARCS.

[26]  Sam Ganzfried,et al.  Bayesian Opponent Exploitation in Imperfect-Information Games , 2016, 2018 IEEE Conference on Computational Intelligence and Games (CIG).

[27]  Tuomas Sandholm,et al.  Solving Large Sequential Games with the Excessive Gap Technique , 2018, NeurIPS.

[28]  Pasquale Malacaria,et al.  Scalable min-max multi-objective cyber-security optimisation over probabilistic attack graphs , 2019, Eur. J. Oper. Res..

[29]  Michael G H Bell,et al.  Attacker-defender model against quantal response adversaries for cyber security in logistics management: An introductory study , 2019, Eur. J. Oper. Res..

[30]  Michael H. Bowling,et al.  Regret Minimization in Games with Incomplete Information , 2007, NIPS.

[31]  M. Buffelli,et al.  Perinatal switch from synchronous to asynchronous activity of motoneurons: Link with synapse elimination , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[32]  R. Lougee-Heimer,et al.  The Common Optimization INterface for Operations Research: Promoting open-source software in the operations research community , 2003 .

[33]  Michael H. Bowling,et al.  Counterfactual Regret Minimization in Sequential Security Games , 2016, AAAI.

[34]  Milind Tambe,et al.  Addressing Behavioral Uncertainty in Security Games: An Efficient Robust Strategic Solution for Defender Patrols , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[35]  Milind Tambe,et al.  From physical security to cybersecurity , 2015, J. Cybersecur..

[36]  Richard F. Deckro,et al.  Simulating attacker and defender strategies within a dynamic game on network topology , 2018, J. Simulation.

[37]  Marc Lanctot,et al.  Computing Approximate Nash Equilibria and Robust Best-Responses Using Sampling , 2011, J. Artif. Intell. Res..

[38]  Darryl K. Ahner,et al.  Real-time heuristic algorithms for the static weapon target assignment problem , 2018, J. Heuristics.

[39]  Tuomas Sandholm,et al.  Safe Opponent Exploitation , 2015, ACM Trans. Economics and Comput..

[40]  Milind Tambe,et al.  Trends and Applications in Stackelberg Security Games , 2018 .

[41]  Oualid Jouini,et al.  Distributionally robust chance-constrained games: existence and characterization of Nash equilibrium , 2016, Optimization Letters.

[42]  Claude Mirodatos,et al.  Hydrogen production from crude pyrolysis oil by a sequential catalytic process , 2007 .

[43]  Raymond R. Hill,et al.  A bilevel exposure-oriented sensor location problem for border security , 2018, Comput. Oper. Res..

[44]  Brian J. Lunday,et al.  Approximate dynamic programming for missile defense interceptor fire control , 2017, Eur. J. Oper. Res..