Fast Reinforcement Learning for Anti-jamming Communications

This letter presents a fast reinforcement learning algorithm for anti-jamming communications which chooses previous action with probability $\tau$ and applies $\epsilon$-greedy with probability $(1-\tau)$. A dynamic threshold based on the average value of previous several actions is designed and probability $\tau$ is formulated as a Gaussian-like function to guide the wireless devices. As a concrete example, the proposed algorithm is implemented in a wireless communication system against multiple jammers. Experimental results demonstrate that the proposed algorithm exceeds Q-learing, deep Q-networks (DQN), double DQN (DDQN), and prioritized experience reply based DDQN (PDDQN), in terms of signal-to-interference-plus-noise ratio and convergence rate.

[1]  H. Vincent Poor,et al.  Two-dimensional anti-jamming communication based on deep reinforcement learning , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Xiang-Yang Li,et al.  Towards Optimal Adaptive UFH-Based Anti-Jamming Wireless Communication , 2012, IEEE Journal on Selected Areas in Communications.

[3]  H. Vincent Poor,et al.  Two-Dimensional Antijamming Mobile Communication Based on Reinforcement Learning , 2017, IEEE Transactions on Vehicular Technology.

[4]  H. Vincent Poor,et al.  Reinforcement Learning-Based NOMA Power Allocation in the Presence of Smart Jamming , 2018, IEEE Transactions on Vehicular Technology.

[5]  Tom Schaul,et al.  Deep Q-learning From Demonstrations , 2017, AAAI.

[6]  H. T. Kung,et al.  Competing Mobile Network Game: Embracing antijamming and jamming strategies with reinforcement learning , 2013, 2013 IEEE Conference on Communications and Network Security (CNS).

[7]  Emmanuel Lance,et al.  A diversity scheme for a phase-coherent frequency-hopping spread-spectrum system , 1997, IEEE Trans. Commun..

[8]  Liang Xiao,et al.  Anti-Jamming Underwater Transmission With Mobility and Learning , 2018, IEEE Communications Letters.

[9]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[10]  Balaraman Ravindran,et al.  Dynamic Action Repetition for Deep Reinforcement Learning , 2017, AAAI.

[11]  Alagan Anpalagan,et al.  Anti-Jamming Communications Using Spectrum Waterfall: A Deep Reinforcement Learning Approach , 2017, IEEE Communications Letters.

[12]  Yan Li,et al.  Power control with reinforcement learning in cooperative cognitive radio networks against jamming , 2015, The Journal of Supercomputing.

[13]  Jitendra K. Tugnait,et al.  Spectrally Efficient Jamming Mitigation Based on Code-Controlled Frequency Hopping , 2011, IEEE Transactions on Wireless Communications.

[14]  Feten Slimeni,et al.  Jamming mitigation in cognitive radio networks using a modified Q-learning algorithm , 2015, 2015 International Conference on Military Communications and Information Systems (ICMCIS).

[15]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[16]  A. Lee Swindlehurst,et al.  Principles of Physical Layer Security in Multiuser Wireless Networks: A Survey , 2010, IEEE Communications Surveys & Tutorials.

[17]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.