Multi-agent Reinforcement Learning in Bayesian Stackelberg Markov Games for Adaptive Moving Target Defense

The field of cybersecurity has mostly been a cat-and-mouse game with the discovery of new attacks leading the way. To take away an attacker's advantage of reconnaissance, researchers have proposed proactive defense methods such as Moving Target Defense (MTD). To find good movement strategies, researchers have modeled MTD as leader-follower games between the defender and a cyber-adversary. We argue that existing models are inadequate in sequential settings when there is incomplete information about a rational adversary and yield sub-optimal movement strategies. Further, while there exists an array of work on learning defense policies in sequential settings for cyber-security, they are either unpopular due to scalability issues arising out of incomplete information or tend to ignore the strategic nature of the adversary simplifying the scenario to use single-agent reinforcement learning techniques. To address these concerns, we propose (1) a unifying game-theoretic model, called the Bayesian Stackelberg Markov Games (BSMGs), that can model uncertainty over attacker types and the nuances of an MTD system and (2) a Bayesian Strong Stackelberg Q-learning (BSS-Q) approach that can, via interaction, learn the optimal movement policy for BSMGs within a reasonable time. We situate BSMGs in the landscape of incomplete-information Markov games and characterize the notion of Strong Stackelberg Equilibrium (SSE) in them. We show that our learning approach converges to an SSE of a BSMG and then highlight that the learned movement policy (1) improves the state-of-the-art in MTD for web-application security and (2) converges to an optimal policy in MTD domains with incomplete information about adversaries even when prior information about rewards and transitions is absent.

[1]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[2]  L. Shapley,et al.  Stochastic Games* , 1953, Proceedings of the National Academy of Sciences.

[3]  H. W. Kuhn,et al.  Contributions to the Theory of Games. Volume II , 1954 .

[4]  G. Leitmann On generalized Stackelberg strategies , 1978 .

[5]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[6]  Michael P. Wellman,et al.  Multiagent Reinforcement Learning: Theoretical Framework and an Algorithm , 1998, ICML.

[7]  Csaba Szepesvári,et al.  A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms , 1999, Neural Computation.

[8]  Michael L. Littman,et al.  Friend-or-Foe Q-learning in General-Sum Games , 2001, ICML.

[9]  Somesh Jha,et al.  Automated generation and analysis of attack graphs , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[10]  Yoav Shoham,et al.  Multi-Agent Reinforcement Learning:a critical survey , 2003 .

[11]  Keith B. Hall,et al.  Correlated Q-Learning , 2003, ICML.

[12]  Stefan Arnborg,et al.  Bayesian Games for Threat Prediction and Situation Analysis , 2004 .

[13]  Ville Könönen,et al.  Asymmetric multiagent reinforcement learning , 2003, Web Intell. Agent Syst..

[14]  B. Stengel,et al.  Leadership with commitment to mixed strategies , 2004 .

[15]  Vincent Conitzer,et al.  Computing the optimal strategy to commit to , 2006, EC '06.

[16]  S. Bhattacharyya,et al.  Leader-Follower semi-Markov Decision Problems: Theoretical Framework and Approximate Solution , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[17]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[18]  Quanyan Zhu,et al.  Dynamic policy-based IDS configuration , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[19]  Charles L. Isbell,et al.  Markov Games of Incomplete Information for Multi-Agent Reinforcement Learning , 2011, Interactive Decision Theory and Game Theory.

[20]  Sushil Jajodia,et al.  Moving Target Defense - Creating Asymmetric Uncertainty for Cyber Threats , 2011, Moving Target Defense.

[21]  Yevgeniy Vorobeychik,et al.  Computing Stackelberg Equilibria in Discounted Stochastic Games , 2012, AAAI.

[22]  W. Kets Finite Depth of Reasoning and Equilibrium Play in Games with Incomplete Information , 2013 .

[23]  Vincent Conitzer,et al.  Solving Security Games on Graphs via Marginal Probabilities , 2013, AAAI.

[24]  Quanyan Zhu,et al.  Game-Theoretic Approach to Feedback-Driven Multi-stage Moving Target Defense , 2013, GameSec.

[25]  Peng Ning,et al.  Dynamic IDS Configuration in the Presence of Intruder Type Uncertainty , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[26]  Scott A. DeLoach,et al.  Towards a Theory of Moving Target Defense , 2014, MTD '14.

[27]  Kevin M. Carter,et al.  A Game Theoretic Approach to Strategy Determination for Dynamic Platform Defenses , 2014, MTD '14.

[28]  Milind Tambe,et al.  From physical security to cybersecurity , 2015, J. Cybersecur..

[29]  Sailik Sengupta,et al.  Moving Target Defense for Web Applications using Bayesian Stackelberg Games: (Extended Abstract) , 2016, AAMAS.

[30]  Karl Tuyls,et al.  Markov Security Games : Learning in Spatial Security Problems , 2016 .

[31]  Azer Bestavros,et al.  Markov Modeling of Moving Target Defense Games , 2016, MTD@CCS.

[32]  Régis Sabbadin,et al.  Leader-Follower MDP Models with Factored State Space and Many Followers - Followers Abstraction, Structured Dynamics and State Aggregation , 2016, ECAI.

[33]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[34]  Sailik Sengupta,et al.  A Game Theoretic Approach to Strategy Generation for Moving Target Defense in Web Applications , 2017, AAMAS.

[35]  Sailik Sengupta Moving Target Defense: A Symbiotic Framework for AI & Security , 2017, AAMAS.

[36]  Yingke Chen,et al.  On Markov Games Played by Bayesian and Boundedly-Rational Players , 2017, AAAI.

[37]  Chi Cheng,et al.  A multi-agent reinforcement learning algorithm based on Stackelberg game , 2017, 2017 6th Data Driven Control and Learning Systems (DDCLS).

[38]  Branislav Bosanský,et al.  An Initial Study of Targeted Personality Models in the FlipIt Game , 2018, GameSec.

[39]  Sailik Sengupta,et al.  Markov Game Modeling of Moving Target Defense for Strategic Detection of Threats in Cloud Networks , 2018, ArXiv.

[40]  Haifeng Xu,et al.  Deceiving Cyber Adversaries: A Game Theoretic Approach , 2018, AAMAS.

[41]  Alina Oprea,et al.  Playing Adaptively Against Stealthy Opponents: A Reinforcement Learning Strategy for the FlipIt Security Game , 2019, ArXiv.

[42]  Yevgeniy Vorobeychik,et al.  Deep Reinforcement Learning based Adaptive Moving Target Defense , 2019, ArXiv.

[43]  Yuandong Tian,et al.  M^3RL: Mind-aware Multi-agent Management Reinforcement Learning , 2018, ICLR.

[44]  Sailik Sengupta,et al.  General Sum Markov Games for Strategic Detection of Advanced Persistent Threats Using Moving Target Defense in Cloud Networks , 2019, GameSec.

[45]  Hanjiang Lai,et al.  Learning Expensive Coordination: An Event-Based Deep RL Approach , 2020, ICLR.

[46]  Sailik Sengupta,et al.  A Survey of Moving Target Defenses for Network Security , 2019, IEEE Communications Surveys & Tutorials.

[47]  Wen Shen,et al.  Spatial-Temporal Moving Target Defense: A Markov Stackelberg Game Model , 2020, AAMAS.