Multi-armed Bandit Online Learning Based on POMDP in Cognitive Radio

In cognitive radio, most of existing research efforts devoted to spectrum sharing have two weakness as follows. First, they are largely formulated as a Markov decision process (MDP), which requires a complete knowledge of channel. Second, most of the studies are online learning based on perceived channel. To solve the above problems, a new algorithm is proposed in this paper: if the authorized user exists in the current channel, Second user will send conservatively in low rate, or send aggressively. When sending conservatively, the state of the channel is not directly observable, the problem turns out to be Partially Observable Markov Decision Process (POMDP).We first establish the optimal threshold when the channel is known, then consider the optimal transmission when the channel is unknown and model for multi-armed bandit. We get the optimal K-conservative policy through the UCB algorithm and improve the convergence speed by UCB-TUNED algorithm. Simulation and analysis results show that it is the same result of K-conservative policy no matter the multiarmed bandit online learning under not fully known channel or the optimal threshold policy under known channel .At the same time, we improve the convergence speed by UCB-TUNED algorithm.

[1]  Mingyan Liu,et al.  Online learning in opportunistic spectrum access: A restless bandit approach , 2010, 2011 Proceedings IEEE INFOCOM.

[2]  Jiang Hong,et al.  Crosslayer parameter configuration for TCP throughput improvement in cognitive radio networks , 2013 .

[3]  Zhou Zheng Dynamic Spectrum Sharing Strategy in Cognitive Radio Systems , 2009 .

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Li Guo,et al.  A bio-inspired approach for cognitive radio networks , 2012 .

[6]  Ranjan K. Mallik,et al.  Cooperative Spectrum Sensing Optimization in Cognitive Radio Networks , 2008, 2008 IEEE International Conference on Communications.

[7]  You Xiaohu An Optimal Cross-layer Spectrum Sharing Scheme for Cognitive Radio Based Ad hoc Network , 2009 .

[8]  Lang Tong,et al.  Betting on Gilbert-Elliot channels , 2010, IEEE Transactions on Wireless Communications.

[9]  Mingyan Liu,et al.  Approximately optimal adaptive learning in opportunistic spectrum access , 2012, 2012 Proceedings IEEE INFOCOM.

[10]  T. Aaron Gulliver,et al.  Graph coloring based spectrum allocation for femtocell downlink interference mitigation , 2011, 2011 IEEE Wireless Communications and Networking Conference.

[11]  K. J. Ray Liu,et al.  Advances in cognitive radio networks: A survey , 2011, IEEE Journal of Selected Topics in Signal Processing.

[12]  Vikram Krishnamurthy,et al.  Opportunistic file transfer over a fading channel: A POMDP search theory formulation with optimal threshold policies , 2006, IEEE Transactions on Wireless Communications.