论文信息 - Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

Adaptive Honeypot Engagement through Reinforcement Learning of Semi-Markov Decision Processes

A honeynet is a promising active cyber defense mechanism. It reveals the fundamental Indicators of Compromise (IoCs) by luring attackers to conduct adversarial behaviors in a controlled and monitored environment. The active interaction at the honeynet brings a high reward but also introduces high implementation costs and risks of adversarial honeynet exploitation. In this work, we apply infinite-horizon Semi-Markov Decision Process (SMDP) to characterize a stochastic transition and sojourn time of attackers in the honeynet and quantify the reward-risk trade-off. In particular, we design adaptive long-term engagement policies shown to be risk-averse, cost-effective, and time-efficient. Numerical results have demonstrated that our adaptive engagement policies can quickly attract attackers to the target honeypot and engage them for a sufficiently long period to obtain worthy threat information. Meanwhile, the penetration probability is kept at a low level. The results show that the expected utility is robust against attackers of a large range of persistence and intelligence. Finally, we apply reinforcement learning to the SMDP to solve the curse of modeling. Under a prudent choice of the learning rate and exploration policy, we achieve a quick and robust convergence of the optimal policy and value.

Quanyan Zhu | Linan Huang | Quanyan Zhu | Linan Huang

[1] Kevin W. Hamlen,et al. Autonomous Cyber Deception: Reasoning, Adaptive Planning, and Evaluation of HoneyThings , 2019, Springer International Publishing.

[2] Sarit Kraus,et al. Playing games for security: an efficient exact algorithm for solving Bayesian Stackelberg games , 2008, AAMAS.

[3] Quanyan Zhu,et al. A Dynamic Bayesian Security Game Framework for Strategic Defense Mechanism Design , 2014, GameSec.

[4] Ion Bica,et al. QRASSH - A Self-Adaptive SSH Honeypot Driven by Q-Learning , 2018, 2018 International Conference on Communications (COMM).

[5] Toshio Nakagawa,et al. Stochastic Processes: with Applications to Reliability Theory , 2011 .

[6] Quanyan Zhu,et al. A cyber-physical game framework for secure and resilient multi-agent autonomous systems , 2015, 2015 54th IEEE Conference on Decision and Control (CDC).

[7] Quanyan Zhu,et al. Distributed Privacy-Preserving Collaborative Intrusion Detection Systems for VANETs , 2018, IEEE Transactions on Signal and Information Processing over Networks.

[8] Quanyan Zhu,et al. A Bi-Level Game Approach to Attack-Aware Cyber Insurance of Computer Networks , 2017, IEEE Journal on Selected Areas in Communications.

[9] Quanyan Zhu,et al. Dynamic policy-based IDS configuration , 2009, Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference.

[10] Quanyan Zhu,et al. DISTRIBUTED AND OPTIMAL RESILIENT PLANNING OF LARGE-SCALE INTERDEPENDENT CRITICAL INFRASTRUCTURES , 2018, 2018 Winter Simulation Conference (WSC).

[11] Quanyan Zhu,et al. Flip the Cloud: Cyber-Physical Signaling Games in the Presence of Advanced Persistent Threats , 2015, GameSec.

[12] Quanyan Zhu,et al. GUIDEX: A Game-Theoretic Incentive-Based Mechanism for Intrusion Detection Networks , 2012, IEEE Journal on Selected Areas in Communications.

[13] Brian Hay,et al. A methodology for intelligent honeypot deployment and active engagement of attackers , 2012 .

[14] Quanyan Zhu,et al. Factored markov game theory for secure interdependent infrastructure networks , 2018 .

[15] Quanyan Zhu,et al. Dynamic Differential Privacy for ADMM-Based Distributed Classification Learning , 2017, IEEE Transactions on Information Forensics and Security.

[16] L. Spitzner,et al. Honeypots: Tracking Hackers , 2002 .

[17] Hongbo Zhu,et al. Deceptive Attack and Defense Game in Honeypot-Enabled Networks for the Internet of Things , 2016, IEEE Internet of Things Journal.

[18] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..

[19] Sushil Jajodia,et al. Moving Target Defense - Creating Asymmetric Uncertainty for Cyber Threats , 2011, Moving Target Defense.

[20] Quanyan Zhu,et al. Modeling and Analysis of Leaky Deception Using Signaling Games With Evidence , 2018, IEEE Transactions on Information Forensics and Security.

[21] Michael Kearns,et al. Near-Optimal Reinforcement Learning in Polynomial Time , 2002, Machine Learning.

[22] Quanyan Zhu,et al. Deployment and exploitation of deceptive honeybots in social networks , 2012, 52nd IEEE Conference on Decision and Control.

[23] Quanyan Zhu,et al. A Dynamic Games Approach to Proactive Defense Strategies against Advanced Persistent Threats in Cyber-Physical Systems , 2019, Comput. Secur..

[24] Kishor S. Trivedi,et al. Optimization for condition-based maintenance with semi-Markov decision process , 2005, Reliab. Eng. Syst. Saf..

[25] Michael O. Duff,et al. Reinforcement Learning Methods for Continuous-Time Markov Decision Problems , 1994, NIPS.

[26] Quanyan Zhu,et al. Game-Theoretic Methods for Robustness, Security, and Resilience of Cyberphysical Control Systems: Games-in-Games Principle for Optimal Cross-Layer Resilient Control Systems , 2015, IEEE Control Systems.

[27] Quanyan Zhu,et al. Security as a Service for Cloud-Enabled Internet of Controlled Things Under Advanced Persistent Threats: A Contract Design Approach , 2017, IEEE Transactions on Information Forensics and Security.

[28] Radu State,et al. Self Adaptive High Interaction Honeypots Driven by Game Theory , 2009, SSS.

[29] Quanyan Zhu,et al. Proactive Defense Against Physical Denial of Service Attacks Using Poisson Signaling Games , 2017, GameSec.

[30] Daiyuan Peng,et al. An SMDP-Based Service Model for Interdomain Resource Allocation in Mobile Cloud Networks , 2012, IEEE Transactions on Vehicular Technology.

[31] Quanyan Zhu,et al. A mean-field stackelberg game approach for obfuscation adoption in empirical risk minimization , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[32] Quanyan Zhu,et al. A Game-theoretic Taxonomy and Survey of Defensive Deception for Cybersecurity and Privacy , 2017, ACM Comput. Surv..

[33] Oguzhan Alagöz,et al. Modeling secrecy and deception in a multiple-period attacker-defender signaling game , 2010, Eur. J. Oper. Res..

[34] Roy D. Yates,et al. Update or wait: How to keep your data fresh , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[35] Peter Stone,et al. Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[36] Yanfei Sun,et al. Strategic Honeypot Game Model for Distributed Denial of Service Attacks in the Smart Grid , 2017, IEEE Transactions on Smart Grid.

[37] Quanyan Zhu,et al. Game-Theoretic Approach to Feedback-Driven Multi-stage Moving Target Defense , 2013, GameSec.

[38] Fabien Pouget. White paper: honeypot, honeynet, honeytoken: terminological issues , 2003 .

[39] Branislav Bosanský,et al. Manipulating Adversary's Belief: A Dynamic Game Approach to Deception by Design for Proactive Network Security , 2017, GameSec.

[40] Quanyan Zhu,et al. Adaptive Strategic Cyber Defense for Advanced Persistent Threats in Critical Infrastructure Networks , 2018, PERV.

[41] Quanyan Zhu,et al. A Stackelberg game perspective on the conflict between machine learning and data obfuscation , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[42] Yishay Mansour,et al. Learning Rates for Q-learning , 2004, J. Mach. Learn. Res..

[43] Quanyan Zhu,et al. Optimal Timing in Dynamic and Robust Attacker Engagement During Advanced Persistent Threats , 2017, 2019 International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOPT).

[44] B. Buchanan,et al. Attributing Cyber Attacks , 2015 .

[45] Michael Kearns,et al. Finite-Sample Convergence Rates for Q-Learning and Indirect Algorithms , 1998, NIPS.

[46] Stephanie Thalberg. Markov Decision Processes With Their Applications , 2016 .

[47] Radha Poovendran,et al. DIFT Games: Dynamic Information Flow Tracking Games for Advanced Persistent Threats , 2018, 2018 IEEE Conference on Decision and Control (CDC).

[48] Quanyan Zhu,et al. Attack-Aware Cyber Insurance for Risk Sharing in Computer Networks , 2015, GameSec.