Cooperative Multi-Agent Reinforcement Learning for Spectrum Management in IoT Cognitive Networks

The paper investigates the applications of cooperative Multi-Agent Reinforcement Learning (MARL) schemes to Cognitive Radio Networking (CRN), which in turn can facilitate spectrum utilization for wireless (ad hoc) networks within the Internet of Things (IoT). These schemes provide the ability of wireless transceivers to learn the optimal control and configuration in unknown environmental and application conditions, exploiting potential for cooperation among spectrum secondary users. An overview of the existing MARL approaches to the CRN is provided, with an analysis of their advantages and weaknesses compared to the rest of CRN approaches. We argue that in typical CRN practical scenarios including IoT systems, it is of essential importance that the cooperative algorithms are completely decentralized and distributed, having also a capability that the agents/nodes together can successfully calculate the optimal strategy even if the individual agents cannot. Hence, we propose a new scheme for cooperative spectrum sensing and selection within CRN, based on an adaptation of a recently proposed cooperative MARL scheme, provide detailed analysis of its properties and potential performance, indicating its superiority compared to the existing schemes.

[1]  Kok-Lim Alvin Yau,et al.  Applications of Reinforcement Learning to Cognitive Radio Networks , 2010, 2010 IEEE International Conference on Communications Workshops.

[2]  Milos S. Stankovic,et al.  Multi-agent temporal-difference learning with linear function approximation: Weak convergence under time-varying network topologies , 2016, 2016 American Control Conference (ACC).

[3]  Miloš Stanković,et al.  Deep Learning Applications in Mobile Networks , 2019 .

[4]  S. Stankovic,et al.  Distributed target tracking in sensor networks using multi‐step consensus , 2018, IET Radar, Sonar & Navigation.

[5]  Efficient convex optimization for beamforming in cognitive radio multicast transmission , 2012, 2012 IEEE International Conference on Communications (ICC).

[6]  Luciano Bononi,et al.  Adaptive Sensing Scheduling and Spectrum Selection in Cognitive Wireless Mesh Networks , 2011, 2011 Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).

[7]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[8]  Xiaojiang Du,et al.  A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security , 2018, IEEE Communications Surveys & Tutorials.

[9]  Zhu Han,et al.  A Survey on Applications of Model-Free Strategy Learning in Cognitive Wireless Networks , 2015, IEEE Communications Surveys & Tutorials.

[10]  Luciano Bononi,et al.  End-to-end protocols for Cognitive Radio Ad Hoc Networks: An evaluation study , 2011, Perform. Evaluation.

[11]  Tommi S. Jaakkola,et al.  Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms , 2000, Machine Learning.

[12]  Ian F. Akyildiz,et al.  Optimal spectrum sensing framework for cognitive radio networks , 2008, IEEE Transactions on Wireless Communications.

[13]  Hüseyin Arslan,et al.  A survey of spectrum sensing algorithms for cognitive radio applications , 2009, IEEE Communications Surveys & Tutorials.

[14]  Ian F. Akyildiz,et al.  NeXt generation/dynamic spectrum access/cognitive radio wireless networks: A survey , 2006, Comput. Networks.

[15]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[16]  Ali H. Sayed,et al.  Distributed Policy Evaluation Under Multiple Behavior Strategies , 2013, IEEE Transactions on Automatic Control.

[17]  Mubashir Husain Rehmani,et al.  When Cognitive Radio meets the Internet of Things? , 2016, 2016 International Wireless Communications and Mobile Computing Conference (IWCMC).

[18]  Fangwen Fu,et al.  Detection of Spectral Resources in Cognitive Radios Using Reinforcement Learning , 2008, 2008 3rd IEEE Symposium on New Frontiers in Dynamic Spectrum Access Networks.

[19]  Cheng Wu,et al.  Learning-Based Spectrum Selection in Cognitive Radio Ad Hoc Networks , 2010, WWIC.

[20]  H. Vincent Poor,et al.  QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus + Innovations , 2012, IEEE Trans. Signal Process..

[21]  Shalabh Bhatnagar,et al.  Fast gradient-descent methods for temporal-difference learning with linear function approximation , 2009, ICML '09.

[22]  Ian F. Akyildiz,et al.  Reinforcement learning-based cooperative sensing in cognitive radio ad hoc networks , 2010, 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[23]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[24]  Aditya Trivedi,et al.  Multichannel CSMA Based MAC Scheme for Unsaturated Cognitive Radio Networks: Performance Study of the Opportunity and Contention Window , 2015, Wirel. Pers. Commun..

[25]  Tamer Basar,et al.  Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents , 2018, ICML.

[26]  Mykel J. Kochenderfer,et al.  Cooperative Multi-agent Control Using Deep Reinforcement Learning , 2017, AAMAS Workshops.

[27]  Luca De Nardis,et al.  Non-cooperative and Cooperative Spectrum Sensing in 5G Cognitive Networks , 2017 .

[28]  L. Buşoniu,et al.  A comprehensive survey of multi-agent reinforcement learning , 2011 .

[29]  Marko Beko,et al.  Efficient Beamforming in Cognitive Radio Multicast Transmission , 2012, IEEE Transactions on Wireless Communications.

[30]  Mubashir Husain Rehmani,et al.  Cognitive-Radio-Based Internet of Things: Applications, Architectures, Spectrum Related Functionalities, and Future Research Directions , 2017, IEEE Wireless Communications.

[31]  Bart De Schutter,et al.  A Comprehensive Survey of Multiagent Reinforcement Learning , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[32]  Luciano Bononi,et al.  Reinforcement Learning-Based Spectrum Management for Cognitive Radio Networks: A Literature Review and Case Study , 2019, Handbook of Cognitive Radio.

[33]  Marko Beko,et al.  Convex optimization-based beamforming in cognitive radio multicast transmission , 2012, 2012 IEEE Vehicular Technology Conference (VTC Fall).

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Tamer Basar,et al.  Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms , 2019, Handbook of Reinforcement Learning and Control.

[36]  Jean-Marie Bonnin,et al.  Cognitive radio for M2M and Internet of Things: A survey , 2016, Comput. Commun..

[37]  Kok-Lim Alvin Yau,et al.  Reinforcement learning for context awareness and intelligence in wireless networks: Review, new features and open issues , 2012, J. Netw. Comput. Appl..

[38]  Tamer Basar,et al.  Decentralized multi-agent reinforcement learning with networked agents: recent advances , 2019, Frontiers of Information Technology & Electronic Engineering.

[39]  Mohsen Guizani,et al.  Opportunistic Bandwidth Sharing Through Reinforcement Learning , 2010, IEEE Transactions on Vehicular Technology.