Reinforcement Learning with Safe Exploration for Network Security

Safe reinforcement learning is important for the safety critical applications especially network security, as the exploration of some dangerous actions can result in huge short-term losses such as network failure or large scale privacy leakage. In this paper, we propose a reinforcement learning algorithm with safe exploration and uses transfer learning to reduce the initial random exploration. A blacklist is maintained to record the most dangerous state-action pairs as a safety constraint. A safe deep reinforcement learning version uses a convolutional neural network to estimate the risk levels and thus further improves the safety of the exploration and accelerates the learning speed for the learning agent. As a case study, the proposed reinforcement learning with safe exploration is applied in the anti-jamming robot communications. Experimental results show that the proposed algorithms can improve the jamming resistance of the robot and reduce the outage rate to enter the most dangerous states compared with the benchmark algorithms.

[1]  Angela P. Schoellig,et al.  Provably Robust Learning-Based Approach for High-Accuracy Tracking Control of Lagrangian Systems , 2019, IEEE Robotics and Automation Letters.

[2]  Ashwin Ram,et al.  Transfer Learning in Real-Time Strategy Games Using Hybrid CBR/RL , 2007, IJCAI.

[3]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[4]  Andreas Krause,et al.  Safe Exploration in Finite Markov Decision Processes with Gaussian Processes , 2016, NIPS.

[5]  Pieter Abbeel,et al.  Safe Exploration in Markov Decision Processes , 2012, ICML.

[6]  Ling Shi,et al.  SINR-Based DoS Attack on Remote State Estimation: A Game-Theoretic Approach , 2017, IEEE Transactions on Control of Network Systems.

[7]  Nathan Fulton,et al.  Safe Reinforcement Learning via Formal Methods: Toward Safe Control Through Proof and Learning , 2018, AAAI.

[8]  H. T. Kung,et al.  Competing Mobile Network Game: Embracing antijamming and jamming strategies with reinforcement learning , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[9]  H. Vincent Poor,et al.  Two-dimensional anti-jamming communication based on deep reinforcement learning , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Laurent Orseau,et al.  Measuring and avoiding side effects using relative reachability , 2018, ArXiv.

[11]  Romain Laroche,et al.  Transfer Reinforcement Learning with Shared Dynamics , 2017, AAAI.

[12]  Weihua Zhuang,et al.  UAV Relay in VANETs Against Smart Jamming With Reinforcement Learning , 2018, IEEE Transactions on Vehicular Technology.

[13]  Andreas Krause,et al.  Safe Model-based Reinforcement Learning with Stability Guarantees , 2017, NIPS.

[14]  Ruslan Salakhutdinov,et al.  Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning , 2015, ICLR.