论文信息 - An Order Optimal Policy for Exploiting Idle Spectrum in Cognitive Radio Networks

An Order Optimal Policy for Exploiting Idle Spectrum in Cognitive Radio Networks

In this paper, a spectrum sensing policy employing recency-based exploration is proposed for cognitive radio networks. We formulate the problem of finding a spectrum sensing policy for multiband dynamic spectrum access as a stochastic restless multiarmed bandit problem with stationary unknown reward distributions. In cognitive radio networks, the multiarmed bandit problem arises when deciding where in the radio spectrum to look for idle frequencies that could be efficiently exploited for data transmission. We consider two models for the dynamics of the frequency bands: 1) the independent model where the state of the band evolves randomly independently from the past and 2) the Gilbert-Elliot model, where the states evolve according to a two-state Markov chain. It is shown that, in these conditions, the proposed sensing policy attains asymptotically logarithmic weak regret. The policy proposed in this paper is an index policy, in which the index of a frequency band comprises a sample mean term and a recency-based exploration bonus term. The sample mean promotes spectrum exploitation, whereas the exploration bonus encourages further exploration for idle bands providing high data rates. The proposed recency-based approach readily allows constructing the exploration bonus such that it will grow the time interval between consecutive sensing time instants of a suboptimal band exponentially, which then leads to logarithmically increasing weak regret. Simulation results confirming logarithmic weak regret are presented, and it is found that the proposed policy provides often improved performance at low complexity over other state-of-the-art policies in the literature.

Visa Koivunen | Jan Oksanen | V. Koivunen | J. Oksanen

[1] Richard S. Sutton,et al. Integrated Architectures for Learning, Planning, and Reacting Based on Approximating Dynamic Programming , 1990, ML.

[2] H. Vincent Poor,et al. A sensing policy based on confidence bounds and a restless multi-armed bandit model , 2012, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR).

[3] H. Vincent Poor,et al. Spectrum exploration and exploitation , 2009 .

[4] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .

[5] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[6] Visa Koivunen,et al. Design of spectrum sensing policy for multi-user multi-band cognitive radio network , 2012, 2012 46th Annual Conference on Information Sciences and Systems (CISS).

[7] J. Doob. Stochastic processes , 1953 .

[8] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[9] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[10] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[11] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[12] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[13] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[14] Mingyan Liu,et al. Online Learning of Rested and Restless Bandits , 2011, IEEE Transactions on Information Theory.

[15] Ing Rj Ser. Approximation Theorems of Mathematical Statistics , 1980 .

[16] Hai Jiang,et al. Channel Exploration and Exploitation with Imperfect Spectrum Sensing in Cognitive Radio Networks , 2013, IEEE Journal on Selected Areas in Communications.

[17] Yishay Mansour,et al. Convergence of Optimistic and Incremental Q-Learning , 2001, NIPS.

[18] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[19] Erik G. Larsson,et al. Spectrum Sensing for Cognitive Radio : State-of-the-Art and Recent Advances , 2012, IEEE Signal Processing Magazine.

[20] Qing Zhao,et al. Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics , 2010, IEEE Transactions on Information Theory.

[21] Hang Su,et al. Opportunistic MAC Protocols for Cognitive Radio Based Wireless Networks , 2007, 2007 41st Annual Conference on Information Sciences and Systems.

[22] Gábor Lugosi,et al. Prediction, learning, and games , 2006 .

[23] Brian M. Sadler,et al. COGNITIVE RADIOS FOR DYNAMIC SPECTRUM ACCESS - Dynamic Spectrum Access in the Time Domain: Modeling and Exploiting White Space , 2007, IEEE Communications Magazine.

[24] R. Munos,et al. Kullback–Leibler upper confidence bounds for optimal sequential allocation , 2012, 1210.1136.

[25] A. Burnetas,et al. Optimal Adaptive Policies for Sequential Allocation Problems , 1996 .

[26] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[27] Sébastien Bubeck. Bandits Games and Clustering Foundations , 2010 .

[28] Sudharman K. Jayaweera,et al. Optimal Myopic Sensing and Dynamic Spectrum Access in Cognitive Radio Networks with Low-Complexity Implementations , 2012, IEEE Transactions on Wireless Communications.

[29] Santosh S. Venkatesh,et al. The Theory of Probability: Explorations and Applications , 2012 .

[30] R.W. Brodersen,et al. Implementation issues in spectrum sensing for cognitive radios , 2004, Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004..

[31] Aurélien Garivier,et al. Optimally Sensing a Single Channel Without Prior Information: The Tiling Algorithm and Regret Bounds , 2011, IEEE Journal of Selected Topics in Signal Processing.

[32] Aurélien Garivier,et al. The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond , 2011, COLT.

[33] Robert L. Smith,et al. Introduction to Markov Processes , 2013 .

[34] John N. Tsitsiklis,et al. The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[35] Bhaskar Krishnamachari,et al. Dynamic Multichannel Access With Imperfect Channel State Detection , 2010, IEEE Transactions on Signal Processing.

[36] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[37] Hyundong Shin,et al. Sensing and Probing Cardinalities for Active Cognitive Radios , 2012, IEEE Transactions on Signal Processing.