Reinforcement Learning Real Experiments for Opportunistic Spectrum Access

This paper proposes the analysis of experimental results obtained on the first worldwide implementation on real signals of reinforcement learning algorithms used for cognitive radio decision making in an opportunistic spectrum access (OSA) context. Two algorithms, able to act in highly unpredictable conditions, are compared: UCB (Upper Confidence Bound) and WD (Weight Driven). The OSA scenario is played in lab conditions around a couple of USRP N210 platforms. One platform is playing the role of the primary network and generates signals in a set of frequency bands with a pre-defined mean vacancy probability for each. An OFDM modulation scheme is used here, generated with GRC environment (GNU Radio Companion). Another platform runs Simulink in order to play the role of the secondary user (SU) cognitive engine that learns. The experimental results shown in this paper illustrate how the SU learns and predicts the channels' vacancy thanks to UCB and WD algorithms. They validate in real conditions machine learning algorithms capabilities for opportunistic spectrum access context, in terms of learning speed and convergence accuracy. They enable also to compare UCB and WD performance.

[1]  David Grace,et al.  Two-stage reinforcement-learning-based cognitive radio with exploration control , 2011, IET Commun..

[2]  Linda Doyle,et al.  Spectrum and Energy Efficient Block Edge Mask-Compliant Waveforms for Dynamic Environments , 2014, IEEE Journal on Selected Areas in Communications.

[3]  Jordi Pérez-Romero,et al.  Spectral occupation measurements and blind standard recognition sensor for cognitive radio networks , 2009, 2009 4th International Conference on Cognitive Radio Oriented Wireless Networks and Communications.

[4]  Wassim Jouini,et al.  Decision making for cognitive radio equipment: analysis of the first 10 years of exploration , 2012, EURASIP Journal on Wireless Communications and Networking.

[5]  Djallel Bouneffouf,et al.  Finite-time analysis of the multi-armed bandit problem with known trend , 2016, 2016 IEEE Congress on Evolutionary Computation (CEC).