论文信息 - Constraints Satisfiability Driven Reinforcement Learning for Autonomous Cyber Defense

Constraints Satisfiability Driven Reinforcement Learning for Autonomous Cyber Defense

With the increasing system complexity and attack sophistication, the necessity of autonomous cyber defense becomes vivid for cyber and cyber-physical systems (CPSs). Many existing frameworks in the current-state-of-the-art either rely on static models with unrealistic assumptions, or fail to satisfy the system safety and security requirements. In this paper, we present a new hybrid autonomous agent architecture that aims to optimize and verify defense policies of reinforcement learning (RL) by incorporating constraints verification (using satisfiability modulo theory (SMT)) into the agent’s decision loop. The incorporation of SMT does not only ensure the satisfiability of safety and security requirements, but also provides constant feedback to steer the RL decision-making toward safe and effective actions. This approach is critically needed for CPSs that exhibit high risk due to safety or security violations. Our evaluation of the presented approach in a simulated CPS environment shows that the agent learns the optimal policy fast and defeats diversified attack strategies in 99% cases.

[1] Vijay Janapa Reddi,et al. Deep Reinforcement Learning for Cyber Security , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[2] Zhisheng Hu,et al. Online Algorithms for Adaptive Cyber Defense on Bayesian Attack Graphs , 2017, MTD@CCS.

[3] Weiming Zhang,et al. Dynamic Defense Strategy against Stealth Malware Propagation in Cyber-Physical Systems , 2018, IEEE INFOCOM 2018 - IEEE Conference on Computer Communications.

[4] Cesare Tinelli,et al. Satisfiability Modulo Theories , 2021, Handbook of Satisfiability.

[5] Antonio Pietrabissa,et al. A Game-Theoretical Approach to Cyber-Security of Critical Infrastructures Based on Multi-Agent Reinforcement Learning , 2018, 2018 26th Mediterranean Conference on Control and Automation (MED).

[6] Alec Radford,et al. Proximal Policy Optimization Algorithms , 2017, ArXiv.

[7] Paul Grünbacher,et al. A Constraint Mining Approach to Support Monitoring Cyber-Physical Systems , 2019, CAiSE.

[8] Naima Kaabouch,et al. Cyber security in the Smart Grid: Survey and challenges , 2013, Comput. Networks.

[9] Demosthenis Teneketzis,et al. A POMDP Approach to the Dynamic Defense of Large-Scale Cyber Networks , 2018, IEEE Transactions on Information Forensics and Security.

[10] Shaolei Ren,et al. Game Theory for Cyber Security and Privacy , 2017, ACM Comput. Surv..