The Proposal of Double Agent Architecture using Actor-critic Algorithm for Penetration Testing

Reinforcement learning (RL) is a widely used machine learning method for optimal decision-making compared to rule-based methods. Because of that advantage, RL has also recently been used a lot in penetration testing (PT) problems to assist in planning and deploying cyber attacks. Although the complexity and size of networks keep increasing vastly every day, RL is currently applied only for small scale networks. This paper proposes a double agent architecture (DAA) approach that is able to drastically increase the size of the network which can be solved with RL. This work also examines the effectiveness of using current popular deep reinforcement learning algorithms including DQN, DDQN, Dueling DQN and D3QN algorithms for PT. The A2C algorithm using Wolpertinger architecture is also adopted as a baseline for comparing the results of the methods. All algorithms are evaluated using a proposed network simulator which is constructed as a Markov decision process (MDP). Our results demonstrate that DAA with A2C algorithm far outweighs other approaches when dealing with large network environments reaching up to 1000 hosts.