Dynamic Spectrum Access in realistic environments using reinforcement learning

We study the use of reinforcement learning to model Dynamic Spectrum Access in a realistic multi-channel environment. Three different approaches from the literature on the multi-armed bandit problem are compared on a set of realistic channel access models - two are based on stochastic models of the channel occupancy, while a third assumes an adversarial model. The algorithms are experimentally tested on channels occupied by primary users that behave according to a simple fair scheduler and a semi-Markov model based on WLAN traffic measurements; models that generate more realistic channel occupancy patterns than allowed by fixed i.i.d. probability models. The experiments show that the UCB1 algorithm of Auer et. al. [1] outperforms the other algorithms, and we support these findings using some simple theoretical results.

[1]  T. L. Lai Andherbertrobbins Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[3]  Qing Zhao,et al.  A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[4]  Zhu Han,et al.  Distributed Cognitive Sensing for Time Varying Channels: Exploration and Exploitation , 2010, 2010 IEEE Wireless Communication and Networking Conference.

[5]  Lang Tong,et al.  A Measurement-Based Model for Dynamic Spectrum Access in WLAN Channels , 2006, MILCOM 2006 - 2006 IEEE Military Communications conference.

[6]  Ananthram Swami,et al.  Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[7]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8]  H. Vincent Poor,et al.  Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[9]  Peter Auer,et al.  Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.