Two Can Play That Game

Cyber-security is an important societal concern. Cyber-attacks have increased in numbers as well as in the extent of damage caused in every attack. Large organizations operate a Cyber Security Operation Center (CSOC), which forms the first line of cyber-defense. The inspection of cyber-alerts is a critical part of CSOC operations (defender or blue team). Recent work proposed a reinforcement learning (RL) based approach for the defender’s decision-making to prevent the cyber-alert queue length from growing large and overwhelming the defender. In this article, we perform a red team (adversarial) evaluation of this approach. With the recent attacks on learning-based decision-making systems, it is even more important to test the limits of the defender’s RL approach. Toward that end, we learn several adversarial alert generation policies and the best response against them for various defender’s inspection policy. Surprisingly, we find the defender’s policies to be quite robust to the best response of the attacker. In order to explain this observation, we extend the earlier defender’s RL model to a game model with adversarial RL, and show that there exist defender policies that can be robust against any adversarial policy. We also derive a competitive baseline from the game theory model and compare it to the defender’s RL approach. However, when we go further to exploit the assumptions made in the Markov Decision Process (MDP) in the defender’s RL model, we discover an attacker policy that overwhelms the defender. We use a double oracle like approach to retrain the defender with episodes from this discovered attacker policy. This made the defender robust to the discovered attacker policy and no further harmful attacker policies were discovered. Overall, the adversarial RL and double oracle approach in RL are general techniques that are applicable to other RL usage in adversarial environments.

[1]  Shlomo Zilberstein,et al.  Dynamic Programming for Partially Observable Stochastic Games , 2004, AAAI.

[2]  Avrim Blum,et al.  Planning in the Presence of Cost Functions Controlled by an Adversary , 2003, ICML.

[3]  Nicolas Christin,et al.  Audit Games , 2013, IJCAI.

[4]  Sushil Jajodia,et al.  Dynamic Optimization of the Level of Operational Effectiveness of a CSOC Under Adverse Conditions , 2018, ACM Trans. Intell. Syst. Technol..

[5]  Richard Bejtlich,et al.  The Tao of Network Security Monitoring: Beyond Intrusion Detection , 2004 .

[6]  Abhinav Gupta,et al.  Robust Adversarial Reinforcement Learning , 2017, ICML.

[7]  Yevgeniy Vorobeychik,et al.  Multi-Defender Strategic Filtering Against Spear-Phishing Attacks , 2016, AAAI.

[8]  Branislav Bosanský,et al.  Optimal Network Security Hardening Using Attack Graph Games , 2015, IJCAI.

[9]  Sushil Jajodia,et al.  Optimal Scheduling of Cybersecurity Analysts for Minimizing Risk , 2017, ACM Trans. Intell. Syst. Technol..

[10]  Andrew Y. Ng,et al.  Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping , 1999, ICML.

[11]  David Silver,et al.  A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning , 2017, NIPS.

[12]  Sylvain Sorin,et al.  Repeated Games by Jean-François Mertens , 2015 .

[13]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[14]  Arslan Munir,et al.  Vulnerability of Deep Reinforcement Learning to Policy Induction Attacks , 2017, MLDM.

[15]  Nicholas R. Jennings,et al.  Market Interfaces for Electric Vehicle Charging , 2017, J. Artif. Intell. Res..

[16]  Sylvain Sorin,et al.  On the values of repeated games with signals , 2014, ArXiv.

[17]  Sushil Jajodia,et al.  Understanding Tradeoffs Between Throughput, Quality, and Cost of Alert Analysis in a CSOC , 2019, IEEE Transactions on Information Forensics and Security.

[18]  Bo Li,et al.  Get Your Workload in Order: Game Theoretic Prioritization of Database Auditing , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).

[19]  Anita D. D'Amico,et al.  The Real Work of Computer Network Defense Analysts , 2007, VizSEC.

[20]  Bo An,et al.  Optimizing Personalized Email Filtering Thresholds to Mitigate Sequential Spear Phishing Attacks , 2016, AAAI.

[21]  Eitan Altman,et al.  Applications of Dynamic Games in Queues , 2005 .

[22]  Vincent Hodgson,et al.  The Single Server Queue. , 1972 .

[23]  Corbin Del Carlo Intrusion detection evasion: How Attackers get past the burglar alarm , 2003 .

[24]  Ming-Yu Liu,et al.  Tactics of Adversarial Attack on Deep Reinforcement Learning Agents , 2017, IJCAI.

[25]  Mohsen Kahani,et al.  Incremental Hybrid Intrusion Detection Using Ensemble of Weak Classifiers , 2008 .

[26]  Bo An,et al.  Defending Against Man-In-The-Middle Attack in Repeated Games , 2017, IJCAI.

[27]  Haifeng Xu,et al.  Deceiving Cyber Adversaries: A Game Theoretic Approach , 2018, AAMAS.

[28]  Mina Guirguis,et al.  Don't Bury your Head in Warnings: A Game-Theoretic Approach for Intelligent Allocation of Cyber-security Alerts , 2017, IJCAI.

[29]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[30]  Kwan-Liu Ma,et al.  VizSEC 2007, Proceedings of the Workshop on Visualization for Computer Security, Sacramento, California, USA, October 29, 2007 , 2008, VizSEC.

[31]  Lantao Yu,et al.  Deep Reinforcement Learning for Green Security Games with Real-Time Information , 2018, AAAI.

[32]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[33]  Itai Ashlagi,et al.  Equilibria of Online Scheduling Algorithms , 2013, AAAI.

[34]  Avshalom Elmalech,et al.  When Suboptimal Rules , 2015, AAAI.

[35]  Jacob Cohen,et al.  On regenerative processes in queueing theory , 1976 .

[36]  Mina Guirguis,et al.  Allocating Security Analysts to Cyber Alerts Using Markov Games , 2018, 2018 National Cyber Summit (NCS).

[37]  Pratyusa K. Manadhata,et al.  The Operational Role of Security Information and Event Management Systems , 2014, IEEE Security & Privacy.

[38]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[39]  Milind Tambe,et al.  One Size Does Not Fit All: A Game-Theoretic Approach for Dynamically and Effectively Screening for Threats , 2016, AAAI.

[40]  Sushil Jajodia,et al.  A methodology to measure and monitor level of operational effectiveness of a CSOC , 2017, International Journal of Information Security.

[41]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[42]  Marek Grzes,et al.  Reward Shaping in Episodic Reinforcement Learning , 2017, AAMAS.