Distributed learning under imperfect sensing in cognitive radio networks

We consider a cognitive radio network, where M distributed secondary users search for spectrum opportunities among N independent channels without information exchange. The occupancy of each channel by the primary network is modeled as a Bernoulli process with unknown mean which represents the unknown traffic load of the primary network. In each slot, a secondary transmitter chooses one channel to sense and subsequently transmit if the channel is sensed as idle. Sensing is considered to be imperfect, i.e., an idle channel can be sensed as busy and vice versa. Users transmitting on the same channel collide and none of them can transmit successfully. The objective is to maximize the system throughput under the collision constraint imposed by the primary network while ensuring synchronized channel selection between each secondary transmitter and its receiver. The performance of a channel selection policy is measured by the system regret, defined as the expected total performance loss with respect to the optimal performance under the ideal scenario where all channel means are known to all users and collisions among users are eliminated throughput perfect scheduling. We show that the optimal system regret has the same logarithmic order as the centralized counterpart with perfect sensing. An order-optimal decentralized policy is constructed to achieve the logarithmic order of the system regret while ensuring fairness among all users.

[1]  Bernard C. Levy,et al.  Principles of Signal Detection and Parameter Estimation , 2008 .

[2]  Hai Jiang,et al.  Medium access in cognitive radio networks: A competitive multi-armed bandit framework , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[3]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[4]  Qing Zhao,et al.  Decentralized multi-armed bandit with multiple distributed players , 2010, 2010 Information Theory and Applications Workshop (ITA).

[5]  W. Marsden I and J , 2012 .

[6]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[7]  Yi Gai,et al.  Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation , 2010, 2010 IEEE Symposium on New Frontiers in Dynamic Spectrum (DySPAN).

[8]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 1985 .

[9]  Bhaskar Krishnamachari,et al.  Decentralized multi-armed bandit with imperfect observations , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[10]  Ao Tang,et al.  Opportunistic Spectrum Access with Multiple Users: Learning under Competition , 2010, 2010 Proceedings IEEE INFOCOM.

[11]  Qing Zhao,et al.  Distributed learning in cognitive radio networks: Multi-armed bandit with distributed multiple players , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.