Adversarial Reinforcement Learning in a Cyber Security Simulation

This paper focuses on cyber-security simulations in networks modeled as a Markov game with incomplete information and stochastic elements. The resulting game is an adversarial sequential decision making problem played with two agents, the attacker and defender. The two agents pit one reinforcement learning technique, like neural networks, Monte Carlo learning and Q-learning, against each other and examine their effectiveness against learning opponents. The results showed that Monte Carlo learning with the Softmax exploration strategy is most effective in performing the defender role and also for learning attacking strategies.

[1]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[2]  Michael L. Littman,et al.  Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[3]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[4]  Aurélien Garivier,et al.  On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems , 2008 .

[5]  William T. B. Uther,et al.  Adversarial Reinforcement Learning , 2003 .

[6]  Chase Qishi Wu,et al.  A Survey of Game Theory as Applied to Network Security , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[7]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[8]  Ravishankar K. Iyer,et al.  Game Theory with Learning for Cyber Security Monitoring , 2016, 2016 IEEE 17th International Symposium on High Assurance Systems Engineering (HASE).

[9]  Milind Tambe,et al.  Security and Game Theory - Algorithms, Deployed Systems, Lessons Learned , 2011 .

[10]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .

[11]  Tzuu-Hseng S. Li,et al.  Backward Q-learning: The combination of Sarsa algorithm and Q-learning , 2013, Eng. Appl. Artif. Intell..

[12]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[13]  John Langford,et al.  Efficient Exploration in Reinforcement Learning , 2010, Encyclopedia of Machine Learning.

[14]  Ravishankar K. Iyer,et al.  Analysis of security data from a large computing organization , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[15]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[16]  Peter Vrancx,et al.  Reinforcement Learning: State-of-the-Art , 2012 .

[17]  Csaba Szepesvári,et al.  The Asymptotic Convergence-Rate of Q-learning , 1997, NIPS.

[18]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19]  P. W. Jones,et al.  Bandit Problems, Sequential Allocation of Experiments , 1987 .

[20]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[21]  J. Urgen Schmidhuber Discovering Problem Solutions with Low Kolmogorov Complexity and High Generalization Capability , 1994 .