A Deep Actor-Critic Reinforcement Learning Framework for Dynamic Multichannel Access

To make efficient use of limited spectral resources, we in this work propose a deep actor-critic reinforcement learning based framework for dynamic multichannel access. We consider both a single-user case and a scenario in which multiple users attempt to access channels simultaneously. We employ the proposed framework as a single agent in the single-user case, and extend it to a decentralized multi-agent framework in the multi-user scenario. In both cases, we develop algorithms for the actor-critic deep reinforcement learning and evaluate the proposed learning policies via experiments and numerical results. In the single-user model, in order to evaluate the performance of the proposed channel access policy and the framework’s tolerance against uncertainty, we explore different channel switching patterns and different switching probabilities. In the case of multiple users, we analyze the probabilities of each user accessing channels with favorable channel conditions and the probability of collision. We also address a time-varying environment to identify the adaptive ability of the proposed framework. Additionally, we provide comparisons (in terms of both the average reward and time efficiency) between the proposed actor-critic deep reinforcement learning framework, Deep-Q network (DQN) based approach, random access, and the optimal policy when the channel dynamics are known.

[1]  Kobi Cohen,et al.  Deep Multi-User Reinforcement Learning for Distributed Dynamic Spectrum Access , 2017, IEEE Transactions on Wireless Communications.

[2]  Wenhan Dai,et al.  Online learning for multi-channel opportunistic access over unknown Markovian channels , 2014, 2014 Eleventh Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[3]  Yasin Yilmaz,et al.  Reinforcement-Learning-Based Resource Allocation in Fog Radio Access Networks for Various IoT Environments , 2018, ArXiv.

[4]  Sebastian Thrun,et al.  Monte Carlo POMDPs , 1999, NIPS.

[5]  Joel Veness,et al.  Monte-Carlo Planning in Large POMDPs , 2010, NIPS.

[6]  Tobias Weber,et al.  Reinforcement Learning for Energy Harvesting Decode-and-Forward Two-Hop Communications , 2017, IEEE Transactions on Green Communications and Networking.

[7]  Mingyan Liu,et al.  Optimality of Myopic Sensing in Multi-Channel Opportunistic Access , 2008, 2008 IEEE International Conference on Communications.

[8]  Soung Chang Liew,et al.  Deep-Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks , 2017, 2018 IEEE International Conference on Communications (ICC).

[9]  Weidong Wang,et al.  Deep Reinforcement Learning Based Dynamic Channel Allocation Algorithm in Multibeam Satellite Systems , 2018, IEEE Access.

[10]  Alagan Anpalagan,et al.  Opportunistic Spectrum Access in Unknown Dynamic Environment: A Game-Theoretic Stochastic Learning Solution , 2012, IEEE Transactions on Wireless Communications.

[11]  François Charpillet,et al.  MAA*: A Heuristic Search Algorithm for Solving Decentralized POMDPs , 2005, UAI.

[12]  Sudharman K. Jayaweera,et al.  Distributed Reinforcement Learning based MAC protocols for autonomous cognitive secondary users , 2011, 2011 20th Annual Wireless and Optical Communications Conference (WOCC).

[13]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[14]  Bhaskar Krishnamachari,et al.  On myopic sensing for multi-channel opportunistic access: structure, optimality, and performance , 2007, IEEE Transactions on Wireless Communications.

[15]  Bing Chen,et al.  Full Spectrum Sharing in Cognitive Radio Networks Toward 5G: A Survey , 2018, IEEE Access.

[16]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[17]  Kan Zheng,et al.  A Reinforcement Learning-Based Resource Allocation Scheme for Cloud Robotics , 2018, IEEE Access.

[18]  Hamed Haddadi,et al.  Deep Learning in Mobile and Wireless Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[19]  Husheng Li Multiagent Q-Learning for Aloha-Like Spectrum Access in Cognitive Radio Systems , 2010, EURASIP J. Wirel. Commun. Netw..

[20]  Ananthram Swami,et al.  Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework , 2007, IEEE Journal on Selected Areas in Communications.

[21]  Reid G. Simmons,et al.  Heuristic Search Value Iteration for POMDPs , 2004, UAI.

[22]  Qinyu Zhang,et al.  Model free dynamic sensing order selection for imperfect sensing multichannel cognitive radio networks: A Q-learning approach , 2014, 2014 IEEE International Conference on Communication Systems.

[23]  Bhaskar Krishnamachari,et al.  Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks , 2018, IEEE Transactions on Cognitive Communications and Networking.

[24]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[25]  Qing Zhao,et al.  A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[26]  Kok-Lim Alvin Yau,et al.  Enhancing network performance in Distributed Cognitive Radio Networks using single-agent and multi-agent Reinforcement Learning , 2010, IEEE Local Computer Network Conference.

[27]  Geoffrey Ye Li,et al.  Deep Reinforcement Learning for Resource Allocation in V2V Communications , 2017, 2018 IEEE International Conference on Communications (ICC).

[28]  Tiejun Lv,et al.  Deep Q-Learning Based Dynamic Resource Allocation for Self-Powered Ultra-Dense Networks , 2018, 2018 IEEE International Conference on Communications Workshops (ICC Workshops).

[29]  Qing Zhao,et al.  Indexability of Restless Bandit Problems and Optimality of Whittle Index for Dynamic Multichannel Access , 2008, IEEE Transactions on Information Theory.

[30]  Zhu Han,et al.  User Scheduling and Resource Allocation in HetNets With Hybrid Energy Supply: An Actor-Critic Reinforcement Learning Approach , 2018, IEEE Transactions on Wireless Communications.

[31]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[32]  Yueming Cai,et al.  Stochastic Game-Theoretic Spectrum Access in Distributed and Dynamic Environment , 2015, IEEE Transactions on Vehicular Technology.

[33]  Qingsong Ai,et al.  Optimally Probing Channel in Opportunistic Spectrum Access , 2018, IEEE Communications Letters.

[34]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[35]  Ying-Chang Liang,et al.  Applications of Deep Reinforcement Learning in Communications and Networking: A Survey , 2018, IEEE Communications Surveys & Tutorials.

[36]  Yasin Yilmaz,et al.  Reinforcement Learning-based Resource Allocation in Fog RAN for IoT with Heterogeneous Latency Requirements , 2018, 1806.04582.