Reinforcement Learning Algorithms for Adaptive Cyber Defense against Heartbleed

In this paper, we investigate a model where a defender and an attacker simultaneously and repeatedly adjust the defenses and attacks. Under this model, we propose two iterative reinforcement learning algorithms which allow the defender to identify optimal defenses when the information about the attacker is limited. With probability one, the adaptive reinforcement learning algorithm converges to the best response with respect to the attacks when the attacker diminishingly explores the system. With a probability arbitrarily close to one, the robust reinforcement learning algorithm converges to the min-max strategy despite that the attacker persistently explores the system. The algorithm convergence is formally proven and the algorithm performance is verified via numerical simulations.

[1]  Sonia Martínez,et al.  Distributed coverage games for mobile visual sensors (I): Reaching the set of Nash equilibria , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[2]  Peng Liu,et al.  Incentive-based modeling and inference of attacker intent, objectives, and strategies , 2005, ACM Trans. Inf. Syst. Secur..

[3]  Jason R. Marden,et al.  Payoff based dynamics for multi-player weakly acyclic games , 2007, 2007 46th IEEE Conference on Decision and Control.

[4]  Salim Hariri,et al.  Game Theory Based Network Security , 2010, J. Information Security.

[5]  Chase Qishi Wu,et al.  A Survey of Game Theory as Applied to Network Security , 2010, 2010 43rd Hawaii International Conference on System Sciences.

[6]  Frank L. Lewis,et al.  Reinforcement Learning and Approximate Dynamic Programming for Feedback Control , 2012 .

[7]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[8]  Tansu Alpcan,et al.  Network Security , 2010 .

[9]  Peng Liu,et al.  Using Signaling Games to Model the Multi-step Attack-Defense Scenarios on Confidentiality , 2012, GameSec.

[10]  Quanyan Zhu,et al.  Hybrid Learning in Stochastic Games and Its Application in Network Security , 2013 .

[11]  John S. Baras,et al.  Game Theoretic Modeling of Malicious Users in Collaborative Networks , 2008, IEEE Journal on Selected Areas in Communications.

[12]  Rainer Böhme,et al.  Security Metrics and Security Investment Models , 2010, IWSEC.

[13]  Sonia Martínez,et al.  Distributed coverage games for mobile visual sensors (II) : Reaching the set of global optima , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[14]  Sushil Jajodia,et al.  Moving Target Defense II: Application of Game Theory and Adversarial Modeling , 2012 .

[15]  Sushil Jajodia,et al.  Moving Target Defense - Creating Asymmetric Uncertainty for Cyber Threats , 2011, Moving Target Defense.

[16]  William W. Streilein,et al.  Survey of Cyber Moving Target Techniques , 2013 .

[17]  Tyler Moore,et al.  The Iterated Weakest Link - A Model of Adaptive Security Investment , 2016, WEIS.

[18]  Sushil Jajodia,et al.  Moving Target Defense II , 2013, Advances in Information Security.

[19]  Jeannette M. Wing,et al.  Game strategies in network security , 2005, International Journal of Information Security.

[20]  Sonia Martínez,et al.  Distributed Coverage Games for Energy-Aware Mobile Sensor Networks , 2013, SIAM J. Control. Optim..

[21]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[22]  H. Young,et al.  The Evolution of Conventions , 1993 .

[23]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[24]  M. Freidlin,et al.  Random Perturbations of Dynamical Systems , 1984 .

[25]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[26]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[27]  Daniel Bachlechner,et al.  To Invest or Not to Invest? Assessing the Economic Viability of a Policy and Security Configuration Management Tool , 2012, WEIS.

[28]  Peng Liu,et al.  Incentive-based modeling and inference of attacker intent, objectives, and strategies , 2003, CCS '03.

[29]  Jason R. Marden,et al.  Payoff-Based Dynamics for Multiplayer Weakly Acyclic Games , 2009, SIAM J. Control. Optim..

[30]  Quanyan Zhu,et al.  Game theory meets network security and privacy , 2013, CSUR.