Deep Echo State Q-Network (DEQN) and Its Application in Dynamic Spectrum Sharing for 5G and Beyond

Deep reinforcement learning (DRL) has been shown to be successful in many application domains. Combining recurrent neural networks (RNNs) and DRL further enables DRL to be applicable in non-Markovian environments by capturing temporal information. However, training of both DRL and RNNs is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted applications, such as those used in the fifth-generation (5G) cellular communication, the environment is highly dynamic, while the available training data is very limited. Therefore, it is extremely important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment requiring limited training overhead. In this article, we introduce the deep echo state Q-network (DEQN) that can adapt to the highly dynamic environment in a short period of time with limited training data. We evaluate the performance of the introduced DEQN method under the dynamic spectrum sharing (DSS) scenario, which is a promising technology in 5G and future 6G networks to increase the spectrum utilization. Compared with conventional spectrum management policy that grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to share the spectrum with the primary system. Our work sheds light on the application of an efficient DRL framework in highly dynamic environments with limited available training data.

[1]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[2]  Toshiyuki Yamane,et al.  Recent Advances in Physical Reservoir Computing: A Review , 2018, Neural Networks.

[3]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[4]  Bhaskar Krishnamachari,et al.  Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.

[5]  Kobi Cohen,et al.  Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access , 2017, IEEE Transactions on Wireless Communications.

[6]  Lassi Hentila,et al.  WINNER II Channel Models , 2009 .

[7]  Lingjia Liu,et al.  Deep Reservoir Computing Meets 5G MIMO-OFDM Systems in Symbol Detection , 2020, AAAI.

[8]  Sergey Levine,et al.  End-to-End Training of Deep Visuomotor Policies , 2015, J. Mach. Learn. Res..

[9]  Yang Liu,et al.  Learning to Play in a Day: Faster Deep Reinforcement Learning by Optimality Tightening , 2016, ICLR.

[10]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[11]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[12]  Benjamin Schrauwen,et al.  Training and Analysing Deep Recurrent Neural Networks , 2013, NIPS.

[13]  Mantas Lukosevicius,et al.  A Practical Guide to Applying Echo State Networks , 2012, Neural Networks: Tricks of the Trade.

[14]  Krishna Sayana,et al.  Downlink MIMO in LTE-advanced: SU-MIMO vs. MU-MIMO , 2012, IEEE Communications Magazine.

[15]  Liesbet Van der Perre,et al.  Adaptive CSI and feedback estimation in LTE and beyond: a Gaussian process regression approach , 2015, EURASIP Journal on Wireless Communications and Networking.

[16]  Haibo He,et al.  Distributive Dynamic Spectrum Access Through Deep Reinforcement Learning: A Reservoir Computing-Based Approach , 2018, IEEE Internet of Things Journal.

[17]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[18]  David Tse,et al.  Fundamentals of Wireless Communication , 2005 .

[19]  Claudio Gallicchio,et al.  Deep reservoir computing: A critical experimental analysis , 2017, Neurocomputing.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[22]  Herbert Jaeger,et al.  The''echo state''approach to analysing and training recurrent neural networks , 2001 .

[23]  A. Ismail,et al.  5G Technology: Towards Dynamic Spectrum Sharing Using Cognitive Radio Networks , 2020, IEEE Access.