论文信息 - Dynamic Spectrum Access in realistic environments using reinforcement learning

Dynamic Spectrum Access in realistic environments using reinforcement learning

We study the use of reinforcement learning to model Dynamic Spectrum Access in a realistic multi-channel environment. Three different approaches from the literature on the multi-armed bandit problem are compared on a set of realistic channel access models - two are based on stochastic models of the channel occupancy, while a third assumes an adversarial model. The algorithms are experimentally tested on channels occupied by primary users that behave according to a simple fair scheduler and a semi-Markov model based on WLAN traffic measurements; models that generate more realistic channel occupancy patterns than allowed by fixed i.i.d. probability models. The experiments show that the UCB1 algorithm of Auer et. al. [1] outperforms the other algorithms, and we support these findings using some simple theoretical results.

Tor André Myrvoll | Jan Erik Håkegård

[1] T. L. Lai Andherbertrobbins. Asymptotically Efficient Adaptive Allocation Rules , 2022 .

[2] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .

[3] Qing Zhao,et al. A Restless Bandit Formulation of Opportunistic Access: Indexablity and Index Policy , 2008, 2008 5th IEEE Annual Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks Workshops.

[4] Zhu Han,et al. Distributed Cognitive Sensing for Time Varying Channels: Exploration and Exploitation , 2010, 2010 IEEE Wireless Communication and Networking Conference.

[5] Lang Tong,et al. A Measurement-Based Model for Dynamic Spectrum Access in WLAN Channels , 2006, MILCOM 2006 - 2006 IEEE Military Communications conference.

[6] Ananthram Swami,et al. Distributed Algorithms for Learning and Cognitive Medium Access with Logarithmic Regret , 2010, IEEE Journal on Selected Areas in Communications.

[7] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[8] H. Vincent Poor,et al. Cognitive Medium Access: Exploration, Exploitation, and Competition , 2007, IEEE Transactions on Mobile Computing.

[9] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.