Reinforcement learning enabled cooperative spectrum sensing in cognitive radio networks

In cognitive radio (CR) networks, fast and accurate spectrum sensing plays a fundamental role in achieving high spectral efficiency. In this paper, a reinforcement learning (RL) enabled cooperative spectrum sensing scheme is proposed for the secondary users (SUs) to determine the scanning order of channels and select the partner for cooperative spectrum sensing. By applying Q-learning approach, each SU learns the occupancy pattern of the primary channels thus forming a dynamic scanning preference list, so as to reduce the scanning overhead and access delay. To improve the detection efficiency in dynamic environment, a discounted upper confidence bound (D-UCB) based cooperation partner selection algorithm is devised wherein each SU learns the time varying detection probability of its neighbors, and selects the one with the potentially highest detection probability as the cooperation partner. Simulation results demonstrate that the proposed cooperative spectrum sensing scheme achieves significant performance gain over various reference algorithms in terms of scanning overhead, access delay, and detection efficiency.

[1]  Vishnu Raj,et al.  Spectrum Access In Cognitive Radio Using a Two-Stage Reinforcement Learning Approach , 2017, IEEE Journal of Selected Topics in Signal Processing.

[2]  Zhimin Zeng,et al.  A Study of Data Fusion and Decision Algorithms Based on Cooperative Spectrum Sensing , 2009, 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery.

[3]  Jonathan P. How,et al.  Quickest change detection approach to optimal control in Markov decision processes with model changes , 2016, 2017 American Control Conference (ACC).

[4]  Fan Yang,et al.  A simple quantization-based multibit cooperative spectrum sensing for cognitive radio networks , 2017, 2017 14th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP).

[5]  Emmanuel Hadoux,et al.  Sequential Decision-Making under Non-stationary Environments via Sequential Change-point Detection , 2014 .

[6]  Manish Gupta,et al.  Improved weighted cooperative spectrum sensing algorithm based on reliability in cognitive radio networks , 2016, 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT).

[7]  Anant Sahai,et al.  Cooperative Sensing among Cognitive Radios , 2006, 2006 IEEE International Conference on Communications.

[8]  Takayuki Ito,et al.  Budget-Limited Multi-armed Bandit Problem with Dynamic Rewards and Proposed Algorithms , 2015, 2015 IIAI 4th International Congress on Advanced Applied Informatics.

[9]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[10]  Yi Zheng,et al.  Cooperative Spectrum Sensing Based on SNR Comparison in Fusion Center for Cognitive Radio , 2009, 2009 International Conference on Advanced Computer Control.

[11]  Jun Cai,et al.  Distributed Opportunistic Spectrum Access in an Unknown and Dynamic Environment: A Stochastic Learning Approach , 2018, IEEE Transactions on Vehicular Technology.

[12]  Abhijit Gosavi,et al.  Reinforcement Learning: A Tutorial Survey and Recent Advances , 2009, INFORMS J. Comput..

[13]  Anna Scaglione,et al.  Restless multi-armed bandits under time-varying activation constraints for dynamic spectrum access , 2014, 2014 48th Asilomar Conference on Signals, Systems and Computers.

[14]  Shaojie Tang,et al.  Almost Optimal Dynamically-Ordered Channel Sensing and Accessing for Cognitive Networks , 2014, IEEE Transactions on Mobile Computing.

[15]  Sherief Abdallah,et al.  Addressing Environment Non-Stationarity by Repeating Q-learning Updates , 2016, J. Mach. Learn. Res..

[16]  Sattar Vakili,et al.  Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems , 2011, IEEE Journal of Selected Topics in Signal Processing.

[17]  Walaa Hamouda,et al.  Advances on Spectrum Sensing for Cognitive Radio Networks: Theory and Applications , 2017, IEEE Communications Surveys & Tutorials.

[18]  Zan Li,et al.  Improved cooperative spectrum sensing model based on machine learning for cognitive radio networks , 2018, IET Commun..

[19]  Mikio Hasegawa,et al.  Application of multi-armed bandit algorithms for channel sensing in cognitive radio , 2012, 2012 IEEE Asia Pacific Conference on Circuits and Systems.

[20]  Paulo Martins Engel,et al.  Dealing with non-stationary environments using context detection , 2006, ICML.

[21]  Ian F. Akyildiz,et al.  Cooperative spectrum sensing in cognitive radio networks: A survey , 2011, Phys. Commun..

[22]  K. J. Ray Liu,et al.  Advances in cognitive radio networks: A survey , 2011, IEEE Journal of Selected Topics in Signal Processing.

[23]  Masahito Togami,et al.  Adaptive Boolean compressive sensing by using multi-armed bandit , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Yonghui Song,et al.  A New Deep-Q-Learning-Based Transmission Scheduling Mechanism for the Cognitive Internet of Things , 2018, IEEE Internet of Things Journal.

[25]  Dit-Yan Yeung,et al.  Hidden-Mode Markov Decision Processes for Nonstationary Sequential Decision Making , 2001, Sequence Learning.

[26]  Weifang Wang,et al.  Spectrum sensing in cognitive radio , 2016 .

[27]  Doina Precup,et al.  Algorithms for multi-armed bandit problems , 2014, ArXiv.

[28]  Ian F. Akyildiz,et al.  Reinforcement learning-based cooperative sensing in cognitive radio ad hoc networks , 2010, 21st Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications.

[29]  Yunfei Chen,et al.  Analysis of Spectrum Occupancy Using Machine Learning Algorithms , 2015, IEEE Transactions on Vehicular Technology.

[30]  Gaurav Verma,et al.  Cooperative Spectrum Sensing for Cognitive Radio Based on Adaptive Threshold , 2016, 2016 Second International Conference on Computational Intelligence & Communication Technology (CICT).