Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.

[1]  Kevin W. Hamlen,et al.  Autonomous Cyber Deception: Reasoning, Adaptive Planning, and Evaluation of HoneyThings , 2019, Springer International Publishing.

[2]  Sarit Kraus,et al.  Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[3]  Quanyan Zhu,et al.  A Dynamic Bayesian Security Game Framework for Strategic Defense Mechanism Design , 2014, GameSec.

[4]  Ion Bica,et al.  QRASSH - A Self-Adaptive SSH Honeypot Driven by Q-Learning , 2018, 2018 International Conference on Communications (COMM).

[5]  Toshio Nakagawa,et al.  Stochastic Processes: with Applications to Reliability Theory , 2011 .

[6]  Quanyan Zhu,et al.  A cyber-physical game framework for secure and resilient multi-agent autonomous systems , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[7]  Quanyan Zhu,et al.  Distributed Privacy-Preserving Collaborative Intrusion Detection Systems for VANETs , 2018, IEEE Transactions on Signal and Information Processing over Networks.

[8]  Quanyan Zhu,et al.  A Bi-Level Game Approach to Attack-Aware Cyber Insurance of Computer Networks , 2017, IEEE Journal on Selected Areas in Communications.

[9]  Quanyan Zhu,et al.  Dynamic policy-based IDS configuration , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[10]  Quanyan Zhu,et al.  DISTRIBUTED AND OPTIMAL RESILIENT PLANNING OF LARGE-SCALE INTERDEPENDENT CRITICAL INFRASTRUCTURES , 2018, 2018 Winter Simulation Conference (WSC).

[11]  Quanyan Zhu,et al.  Flip the Cloud: Cyber-Physical Signaling Games in the Presence of Advanced Persistent Threats , 2015, GameSec.

[12]  Quanyan Zhu,et al.  GUIDEX: A Game-Theoretic Incentive-Based Mechanism for Intrusion Detection Networks , 2012, IEEE Journal on Selected Areas in Communications.

[13]  Brian Hay,et al.  A methodology for intelligent honeypot deployment and active engagement of attackers , 2012 .

[14]  Quanyan Zhu,et al.  Factored markov game theory for secure interdependent infrastructure networks , 2018 .

[15]  Quanyan Zhu,et al.  Dynamic Differential Privacy for ADMM-Based Distributed Classification Learning , 2017, IEEE Transactions on Information Forensics and Security.

[16]  L. Spitzner,et al.  Honeypots: Tracking Hackers , 2002 .

[17]  Hongbo Zhu,et al.  Deceptive Attack and Defense Game in Honeypot-Enabled Networks for the Internet of Things , 2016, IEEE Internet of Things Journal.

[18]  Javier García,et al.  A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[19]  Sushil Jajodia,et al.  Moving Target Defense - Creating Asymmetric Uncertainty for Cyber Threats , 2011, Moving Target Defense.

[20]  Quanyan Zhu,et al.  Modeling and Analysis of Leaky Deception Using Signaling Games With Evidence , 2018, IEEE Transactions on Information Forensics and Security.

[21]  Michael Kearns,et al.  Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[22]  Quanyan Zhu,et al.  Deployment and exploitation of deceptive honeybots in social networks , 2012, 52nd IEEE Conference on Decision and Control.

[23]  Quanyan Zhu,et al.  A Dynamic Games Approach to Proactive Defense Strategies against Advanced Persistent Threats in Cyber-Physical Systems , 2019, Comput. Secur..

[24]  Kishor S. Trivedi,et al.  Optimization for condition-based maintenance with semi-Markov decision process , 2005, Reliab. Eng. Syst. Saf..

[25]  Michael O. Duff,et al.  Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[26]  Quanyan Zhu,et al.  Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems , 2015, IEEE Control Systems.

[27]  Quanyan Zhu,et al.  Security as a Service for Cloud-Enabled Internet of Controlled Things Under Advanced Persistent Threats: A Contract Design Approach , 2017, IEEE Transactions on Information Forensics and Security.

[28]  Radu State,et al.  Self Adaptive High Interaction Honeypots Driven by Game Theory , 2009, SSS.

[29]  Quanyan Zhu,et al.  Proactive Defense Against Physical Denial of Service Attacks Using Poisson Signaling Games , 2017, GameSec.

[30]  Daiyuan Peng,et al.  An SMDP-Based Service Model for Interdomain Resource Allocation in Mobile Cloud Networks , 2012, IEEE Transactions on Vehicular Technology.

[31]  Quanyan Zhu,et al.  A mean-field stackelberg game approach for obfuscation adoption in empirical risk minimization , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[32]  Quanyan Zhu,et al.  A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy , 2017, ACM Comput. Surv..

[33]  Oguzhan Alagöz,et al.  Modeling secrecy and deception in a multiple-period attacker-defender signaling game , 2010, Eur. J. Oper. Res..

[34]  Roy D. Yates,et al.  Update or wait: How to keep your data fresh , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[35]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[36]  Yanfei Sun,et al.  Strategic Honeypot Game Model for Distributed Denial of Service Attacks in the Smart Grid , 2017, IEEE Transactions on Smart Grid.

[37]  Quanyan Zhu,et al.  Game-Theoretic Approach to Feedback-Driven Multi-stage Moving Target Defense , 2013, GameSec.

[38]  Fabien Pouget White paper: honeypot, honeynet, honeytoken: terminological issues , 2003 .

[39]  Branislav Bosanský,et al.  Manipulating Adversary's Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security , 2017, GameSec.

[40]  Quanyan Zhu,et al.  Adaptive Strategic Cyber Defense for Advanced Persistent Threats in Critical Infrastructure Networks , 2018, PERV.

[41]  Quanyan Zhu,et al.  A Stackelberg game perspective on the conflict between machine learning and data obfuscation , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[42]  Yishay Mansour,et al.  Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[43]  Quanyan Zhu,et al.  Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats , 2017, 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT).

[44]  B. Buchanan,et al.  Attributing Cyber Attacks , 2015 .

[45]  Michael Kearns,et al.  Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[46]  Stephanie Thalberg Markov Decision Processes With Their Applications , 2016 .

[47]  Radha Poovendran,et al.  DIFT Games: Dynamic Information Flow Tracking Games for Advanced Persistent Threats , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[48]  Quanyan Zhu,et al.  Attack-Aware Cyber Insurance for Risk Sharing in Computer Networks , 2015, GameSec.